Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells-Reference-Cited by-同舟云学术

Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells

Published:2022-03-23 Issue: Volume:13 Page:
ISSN:1664-2392
Container-title:Frontiers in Endocrinology
language:
Short-container-title:Front. Endocrinol.

Author:

Wong Wilson K. M.,Thorat Vinod,Joglekar Mugdha V.,Dong Charlotte X.,Lee Hugo,Chew Yi Vee,Bhave Adwait,Hawthorne Wayne J.,Engin Feyza,Pant Aniruddha,Dalgaard Louise T.,Bapat Sharda,Hardikar Anandwardhan A.

Abstract

Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1αβ-/- mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses.

Publisher

Frontiers Media SA

Subject

Endocrinology, Diabetes and Metabolism

Reference45 articles.

1. A Primer on Deep Learning in Genomics;Zou;Nat Genet,2019

2. Integration of Mechanistic Immunological Knowledge Into a Machine Learning Pipeline Improves Predictions;Culos;Nat Mach Intell,2020

3. Deep Neural Networks Identify Sequence Context Features Predictive of Transcription Factor Binding;Zheng;Nat Mach Intell,2021

4. Machine Learning and Complex Biological Data;Xu;Genome Biol,2019

5. Iterative Transfer Learning With Neural Network for Clustering and Cell Type Classification in Single-Cell RNA-Seq Analysis;Hu;Nat Mach Intell,2020

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning for catalysing the integration of noncoding RNA in research and clinical practice;eBioMedicine;2024-08

2. A Model for Detecting Type 2 Diabetes Using Mixed Single-Cell RNA Sequencing with Optimized Data;SN Computer Science;2023-10-06