AlzGenPred: A CatBoost based method using network features to classify the Alzheimer’s Disease associated genes from the high throughput sequencing data

Author:

Shukla Rohit,Singh Tiratha Raj

Abstract

AbstractBackground and ObjectiveAD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing technologies, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic because all these algorithms are based on statistical techniques. Therefore, AlzGenPred is developed to identify the AD-associated genes from a large set of data.MethodsTo develop the AlzGenPred, we have compiled a benchmark dataset consisting of 1086 AD and non-AD genes and used them as positive and negative datasets. We have generated several features including the fused features and evaluated them through machine learning methods. Then hyperparameter tuning approach was also applied and the final model was selected. The proposed method was validated by using the AlzGene and transcriptomics datasets and proposed as a standalone tool.ResultsTotal 13504 features belonging to eight different encoding schemes of these sequences were generated and evaluated by using 16 ML algorithms. It reveals that network-based features can classify AD genes while sequence-based features are not able to classify them. Then we generated 24 different fused features (6020 D) using sequence-based features and fed them into a two-step lightGBM-based recursive feature selection method. It increased up to 5-7% accuracy. After that selected eight fused features with CKSAAP were used for the hyperparameter tuning. They showed <70% accuracy. Therefore, network-based features were used to generate the CatBoost-based ML method called AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model is validated using the transcriptomics dataset also.ConclusionThe validation of AlzGenPred using the AlzGene dataset and transcriptomics dataset obtained from Human, mouse, and ES-derived neural cells revealed that it can classify the omics data and can sort the AD-associated genes. These predicted genes can be directly used in the wet lab for further testing which will reduce labor cost and time expenses. The AlzGenPred is developed as a standalone package and is available for users athttps://www.bioinfoindia.org/alzgenpred/andhttps://github.com/shuklarohit815/AlzGenPred.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3