The effect of non-linear signal in classification problems using gene expression-Reference-Cited by-同舟云学术

The effect of non-linear signal in classification problems using gene expression

Published:2023-03-27 Issue:3 Volume:19 Page:e1010984
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Heil Benjamin J.^ORCID,Crawford Jake^ORCID,Greene Casey S.^ORCID

Abstract

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.

Funder

National Human Genome Research Institute

Gordon and Betty Moore Foundation

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference35 articles.

1. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes.;JS Parker;JCO,2009

2. Gene Expression Profiling for the Identification and Classification of Antibody-Mediated Heart Rejection;A Loupy;Circulation,2017

3. Large-scale labeling and assessment of sex bias in publicly available expression data;E Flynn;BMC Bioinformatics,2021

4. Compute Trends Across Three Eras of Machine Learning.;J Sevilla;arXiv. arXiv,2022

5. Massive mining of publicly available RNA-seq data from human and mouse.;A Lachmann;Nat Commun.,2018

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluation of water quality based on artificial intelligence: performance of multilayer perceptron neural networks and multiple linear regression versus water quality indexes;Environment, Development and Sustainability;2024-06-01

2. MousiPLIER: A Mouse Pathway-Level Information Extractor Model;eneuro;2024-05-24