Abstract
AbstractAccurate disease diagnosis and prognosis are crucial for effective treatment management and improving patient outcomes. However, accurately detecting early signs of certain diseases or recurrence remains challenging. Existing machine-learning methods for identifying gene expression biomarkers have several limitations, including poor performance on independent test datasets, inability to directly process omics data, and difficulty in identifying noncoding RNA genes as biomarkers. Additionally, these methods may not provide sufficient biological interpretation of their results, and the panel biomarkers they identify may not be suitable for clinical application. To address these limitations, we have developed a new computational method called BAMBI, which integrates multiple machine-learning algorithms and statistical approaches to identify putative coding and noncoding genes as biomarkers for disease diagnosis and prognosis. We evaluated BAMBI ability to identify diagnostic and prognostic biomarkers by analyzing multiple RNA-seq datasets from cancerous and non-cancerous diseases at population levels. The results from BAMBI demonstrate significant biological interpretability and state-of-the-art prediction performance. When the singular gene identified by BAMBI is used as a diagnostic biomarker, it achieves a balance accuracy exceeding 95% in studies of both breast cancer and psoriasis. Additionally, the prognostic biomarkers that BAMBI identifies from RNA-seq data of Acute Myeloid Leukemia (AML) patients significantly correlate with the survival rates in an independent AML patient cohort. Additionally, BAMBI outperforms existing methods by delivering more robust results, identifying biomarkers with fewer genes, and simultaneously achieving superior prediction accuracy. We have implemented BAMBI into user-friendly software for the research community. In summary, BAMBI serves as a more reliable pipeline for identifying both coding and noncoding genes as biosignature markers, enhancing the accuracy of disease diagnosis and prognosis. BAMBI is available viahttps://github.com/CZhouLab/BAMBI.
Publisher
Cold Spring Harbor Laboratory
Reference44 articles.
1. Alon, U. , Barkai, N. , Notterman, D. A. , Gish, K. , Ybarra, S. , Mack, D. , & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In Cell Biology (Vol. 96). http://www.pnas.org.
2. HTSeq--a Python framework to work with high-throughput sequencing data
3. Long noncoding RNAs as promising biomarkers in cancer
4. CD4+ T helper 2 cells suppress breast cancer by inducing terminal differentiation
5. Broome, A.-M. , Ryan, D. , & Eckert, R. L. (n.d.). S100 Protein Subcellular Localization During Epidermal Differentiation and Psoriasis. In Departments of Physiology and Biophysics.