The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry-Reference-Cited by-同舟云学术

The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry

Published:2017-08 Issue:4-5 Volume:17 Page:290-299
ISSN:1471-082X
Container-title:Statistical Modelling
language:en
Short-container-title:Statistical Modelling

Author:

Dowsey Andrew W¹

Affiliation:

1. School of Social & Community Medicine and School of Veterinary Sciences, Faculty of Health Sciences, University of Bristol, United Kingdom.

Abstract

In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.

Publisher

SAGE Publications

Subject

Statistics, Probability and Uncertainty,Statistics and Probability

Link

http://journals.sagepub.com/doi/pdf/10.1177/1471082X17708519

Reference35 articles.

1. Reproducibility crisis: Blame it on the antibodies

2. Probabilistic Mixture Regression Models for Alignment of LC-MS Data

3. A flexible statistical model for alignment of label-free proteomics data - incorporating ion mobility and product ion information

4. Proteome Informatics