Feature Selection Using Lasso Regression Enhances Deep Learning Model Performance For Diagnosis Of Lung Cancer from Transcriptomic Data

Author:

Guha SouvikORCID

Abstract

AbstractCancer is a genetic disease where gene mutations are pivotal in disease initiation and pathophysiology. The gene expression profile follows a specific pattern exclusive to each cancer which can be utilized for early and accurate diagnosis. Microarray techniques have emerged as powerful tools capable of simultaneously capturing the expression profiles of thousands of genes. However, because of the high dimensionality of the produced transcriptome data, analysis of the resulting datasets is challenging. Recent advancements in Artificial Intelligence (AI) techniques like Machine Learning (ML) and Deep Learning can be instrumental in efficiently processing these high-dimensional datasets. LASSO-regression is a ML technique that can help to rank the features which could help in feature selection leading to dimensionality reduction. Deep Learning is one of the most sophisticated ML techniques that can process high-dimensional data owing to the presence of more number of hidden layers in its neural network. We designed a Deep Neural Network (DNN) classifier model fused with a LASSO-based significant feature extractor for classifying the gene expression dataset containing a total of 51 samples of which 24 samples are of lung cancer patients and the remaining 27 samples are of normal individuals. A LASSO regression model was implemented to identify the genes that played a significant role in the classification. These significant gene expressions were then fed into a convergent Deep Neural Architecture. The classifier was trained with 70% data and the rest 30% was used for validation. The proposed classifier proved to provide better classification as compared to LASSO regression and DNN used individually. The two classes were classified with an average accuracy of 96.25%, average precision of 99.67%, average specificity of 99.45% and average sensitivity of 91.73% measured over thirty independent assessments. In some cases, the model was able to obtain a classification accuracy of 100%. This could open the path to early and better diagnosis of cancers from transcriptome data.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3