Feature Selection Using Lasso Regression Enhances Deep Learning Model Performance For Diagnosis Of Lung Cancer from Transcriptomic Data-Reference-Cited by-同舟云学术

Feature Selection Using Lasso Regression Enhances Deep Learning Model Performance For Diagnosis Of Lung Cancer from Transcriptomic Data

Published:2024-05-04 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Guha Souvik^ORCID

Abstract

AbstractCancer is a genetic disease where gene mutations are pivotal in disease initiation and pathophysiology. The gene expression profile follows a specific pattern exclusive to each cancer which can be utilized for early and accurate diagnosis. Microarray techniques have emerged as powerful tools capable of simultaneously capturing the expression profiles of thousands of genes. However, because of the high dimensionality of the produced transcriptome data, analysis of the resulting datasets is challenging. Recent advancements in Artificial Intelligence (AI) techniques like Machine Learning (ML) and Deep Learning can be instrumental in efficiently processing these high-dimensional datasets. LASSO-regression is a ML technique that can help to rank the features which could help in feature selection leading to dimensionality reduction. Deep Learning is one of the most sophisticated ML techniques that can process high-dimensional data owing to the presence of more number of hidden layers in its neural network. We designed a Deep Neural Network (DNN) classifier model fused with a LASSO-based significant feature extractor for classifying the gene expression dataset containing a total of 51 samples of which 24 samples are of lung cancer patients and the remaining 27 samples are of normal individuals. A LASSO regression model was implemented to identify the genes that played a significant role in the classification. These significant gene expressions were then fed into a convergent Deep Neural Architecture. The classifier was trained with 70% data and the rest 30% was used for validation. The proposed classifier proved to provide better classification as compared to LASSO regression and DNN used individually. The two classes were classified with an average accuracy of 96.25%, average precision of 99.67%, average specificity of 99.45% and average sensitivity of 91.73% measured over thirty independent assessments. In some cases, the model was able to obtain a classification accuracy of 100%. This could open the path to early and better diagnosis of cancers from transcriptome data.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. World Health Organization. The Global Burden of Disease: 2004 Update. Geneva: World Health Organization; 2008.

2. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries

3. At least one in seven cases of cancer is caused by smoking. Global estimates for 1985

4. Breast cancer lung metastasis: Molecular biology and therapeutic implications

5. A Risk Model for Prediction of Lung Cancer