Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Author(s): Chen T;Chen T; Kabir MF; Kabir MF
  • Source:
    PloS one [PLoS One] 2024 May 10; Vol. 19 (5), pp. e0302947. Date of Electronic Publication: 2024 May 10 (Print Publication: 2024).
  • Publication Type:
    Journal Article; Research Support, Non-U.S. Gov't
  • Language:
    English
  • Additional Information
    • Source:
      Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
    • Publication Information:
      Original Publication: San Francisco, CA : Public Library of Science
    • Subject Terms:
    • Abstract:
      In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.
      Competing Interests: The authors have declared that no competing interests exist.
      (Copyright: © 2024 Chen, Kabir. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
    • References:
      Nat Mach Intell. 2020 Jan;2(1):56-67. (PMID: 32607472)
      Molecules. 2023 Jan 23;28(3):. (PMID: 36770815)
      Comput Methods Programs Biomed. 2023 Oct;240:107719. (PMID: 37453366)
      Oncotarget. 2016 Mar 29;7(13):16895-909. (PMID: 26942877)
      Sci Rep. 2022 Dec 19;12(1):21915. (PMID: 36535969)
      Artif Intell Med. 2022 Sep;131:102349. (PMID: 36100346)
      Cancer Inform. 2007 Feb 11;2:59-77. (PMID: 19458758)
      Bioinformatics. 2018 Dec 1;34(23):4007-4016. (PMID: 29868903)
      Nat Mach Intell. 2019 May;1(5):206-215. (PMID: 35603010)
      PLoS One. 2013 Apr 30;8(4):e61318. (PMID: 23646105)
      Genet Epidemiol. 2017 Dec;41(8):844-865. (PMID: 29114920)
      Pac Symp Biocomput. 2016;22:219-229. (PMID: 27896977)
      BMC Genomics. 2020 Jan 2;21(1):6. (PMID: 31898477)
      J Theor Biol. 2014 Jan 21;341:34-40. (PMID: 24035842)
      Comput Methods Programs Biomed. 2018 Nov;166:99-105. (PMID: 30415723)
      J Chem Inf Model. 2023 Nov 13;63(21):6537-6554. (PMID: 37905969)
      PLoS One. 2021 Apr 16;16(4):e0250370. (PMID: 33861809)
      Bioinformatics. 2011 Nov 1;27(21):3017-23. (PMID: 21893520)
      Artif Intell Med. 2017 Jun;79:62-70. (PMID: 28655440)
      PLoS One. 2012;7(7):e39932. (PMID: 22808075)
      Molecules. 2019 May 22;24(10):. (PMID: 31121946)
    • Publication Date:
      Date Created: 20240510 Date Completed: 20240510 Latest Revision: 20240512
    • Publication Date:
      20240512
    • Accession Number:
      PMC11086842
    • Accession Number:
      10.1371/journal.pone.0302947
    • Accession Number:
      38728288