Systematic comparison of ranking aggregation methods for gene lists in experimental results-Reference-Cited by-同舟云学术

Systematic comparison of ranking aggregation methods for gene lists in experimental results

Published:2022-09-12 Issue:21 Volume:38 Page:4927-4933
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Wang Bo¹^ORCID,Law Andy¹,Regan Tim¹^ORCID,Parkinson Nicholas¹,Cole Joby²,Russell Clark D³^ORCID,Dockrell David H³,Gutmann Michael U⁴^ORCID,Baillie J Kenneth¹

Affiliation:

1. Roslin Institute, University of Edinburgh , Edinburgh EH25 9RG, UK

2. University of Sheffield , Sheffield S10 2NT, UK

3. Centre for Inflammation Research, The Queen’s Medical Research Institute, University of Edinburgh , Edinburgh EH16 4TJ, UK

4. School of Informatics, University of Edinburgh , Edinburgh EH8 9AB, UK

Abstract

Abstract Motivation A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists. Results In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets. Availability and implementation The code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

MRC SHIELD consortium

Edinburgh Global Research Scholarship from the University of Edinburgh

Institute Strategic funding provided to the Roslin Institute by the BBSRC

Wellcome Trust Senior Research Fellowship

Sepsis Research (Fiona Elizabeth Agnew Trust), a BBSRC Institute Strategic Programme

Roslin Institute

UK Intensive Care Society

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac621/46205348/btac621.pdf

Reference37 articles.

1. Expression of apoptosis-related genes in an Ethiopian cohort study correlates with tuberculosis clinical status;Abebe;Eur. J. Immunol,2010

2. Aggregation of partial rankings, p-ratings and top-m lists;Ailon;Algorithmica,2010

3. An atlas of active enhancers across human cell types and tissues;Andersson;Nature,2014

4. Hybrid Bayesian-rank integration approach improves the predictive power of genomic dataset aggregation;Badgeley;Bioinformatics,2015

5. Non-small-cell lung cancer molecular signatures recapitulate lung developmental pathways;Borczuk;Am. J. Pathol,2003

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transcriptional Dynamics and Key Regulators of Adipogenesis in Mouse Embryonic Stem Cells: Insights from Robust Rank Aggregation Analysis;International Journal of Molecular Sciences;2024-08-23

2. The genomic landscape of Acute Respiratory Distress Syndrome: a meta-analysis by information content of genome-wide studies of the host response;2024-02-14

3. The relationship between tumor infiltrating immune cells and the prognosis of patients with lung adenocarcinoma;Journal of Thoracic Disease;2023-02

4. An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers;Computational and Structural Biotechnology Journal;2023