A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma-Reference-Cited by-同舟云学术

A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma

Published:2022-09-06 Issue:9 Volume:17 Page:e0269126
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Abdelwahab Omar^ORCID,Awad Nourelislam,Elserafy Menattallah,Badr Eman^ORCID

Abstract

Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. Feature selection (FS) techniques overcome the high dimensionality and sparsity issues of the large-scale data. We propose a framework that applies an ensemble of feature selection techniques to identify genes highly correlated to LUAD. Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), we employed mutual information (MI) and recursive feature elimination (RFE) feature selection techniques along with support vector machine (SVM) classification model. We have also utilized Random Forest (RF) as an embedded FS technique. The results were integrated and candidate biomarker genes across all techniques were identified. The proposed framework has identified 12 potential biomarkers that are highly correlated with different LC types, especially LUAD. A predictive model has been trained utilizing the identified biomarker expression profiling and performance of 97.99% was achieved. In addition, upon performing differential gene expression analysis, we could find that all 12 genes were significantly differentially expressed between normal and LUAD tissues, and strongly correlated with LUAD according to previous reports. We here propose that using multiple feature selection methods effectively reduces the number of identified biomarkers and directly affects their biological relevance.

Funder

International Centre for Genetic Engineering and Biotechnology

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference82 articles.

1. Should we abandon the t-Test in the analysis of gene expression microarray data: A comparison of variance modeling strategies;M Jeanmougin;PLoS One,2010

2. Differential gene expression detection and sample classification using penalized linear regression models;B. Wu;Bioinformatics,2006

3. Differential expression analysis for sequence count data;S Anders;Genome Biol,2010

4. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data;M Radovic;BMC Bioinformatics,2017

5. ClearF: A supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction;S Wang;BMC Med Genomics,2019

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Integrated Data Analysis Using Bioinformatics and Random Forest to Predict Prognosis of Patients With Squamous Cell Lung Cancer;IEEE Access;2024

2. Gene Expression and Metadata Based Identification of Key Genes for Hepatocellular Carcinoma Using Machine Learning and Statistical Models;IEEE/ACM Transactions on Computational Biology and Bioinformatics;2023-11

3. Artificial intelligence with temporal features outperforms machine learning in predicting diabetes;PLOS Digital Health;2023-10-25

4. Evaluation and Exploration of Machine Learning and Convolutional Neural Network Classifiers in Detection of Lung Cancer from Microarray Gene—A Paradigm Shift;Bioengineering;2023-08-06

5. Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio;PLOS ONE;2023-04-25