Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets-Reference-Cited by-同舟云学术

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets

Published:2022-07-28 Issue:7 Volume:17 Page:e0252697
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Forouzandeh Amir^ORCID,Rutar Alex,Kalmady Sunil V.,Greiner Russell^ORCID

Abstract

Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).

Funder

Natural Sciences and Engineering Research Council of Canada

Alberta Machine Intelligence Institute

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference72 articles.

1. What are biomarkers?;K Strimbu;Current Opinion in HIV and AIDS,2010

2. Gene expression profiling predicts clinical outcome of breast cancer;LJ Van’t Veer;Nature,2002

3. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer;Y Wang;The Lancet,2005

4. Genetics of phenylketonuria: then and now;N Blau;Human mutation,2016

5. Correlation of glucose regulation and hemoglobin AIc in diabetes mellitus;RJ Koenig;New England Journal of Medicine,1976

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards early diagnosis of Alzheimer’s disease: advances in immune-related blood biomarkers and computational approaches;Frontiers in Immunology;2024-04-23

2. Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD;Frontiers in Genetics;2024-01-29