Affiliation:
1. Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory , 17121 Solna, Sweden
2. Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University , 75185 Uppsala, Sweden
Abstract
Abstract
Motivation
Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.
Results
We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.
Availability and implementation
Data: https://bitbucket.org/sonnhammergrni/lscon; Code: https://bitbucket.org/sonnhammergrni/genespider.
Supplementary information
Supplementary data are available at Bioinformatics online.
Funder
Swedish Strategic Research Foundation for financial support. This project was performed with
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Reference25 articles.
1. Graphs in statistical analysis;Anscombe;Am. Stat,1973
2. Computational inference of gene regulatory networks: approaches, limitations and opportunities;Banf;Biochim. Biophys. Acta Gene Regul. Mech,2017
3. How to standardize regression coefficients;Bring;Am. Stat,1994
4. The inverse;Bronson;Matrix Methods,2021
5. Regularization paths for generalized linear models via coordinate descent;Friedman;J. Stat. Softw,2010
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献