Querying multiple sets of P-values through composed hypothesis testing

Author:

Mary-Huard Tristan12ORCID,Das Sarmistha3,Mukhopadhyay Indranil3ORCID,Robin Stéphane14

Affiliation:

1. Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France

2. Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France

3. Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India

4. Centre d’Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France

Abstract

Abstract Motivation Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. Results We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. Availability and implementation The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Indo-French Center for Applied Mathematics

Investissement d’Avenir’ project (Amaizing

Department of Biotechnology, Govt. of India

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference31 articles.

1. Determination of the differentially expressed genes in microarray experiments using local FDR;Aubert;BMC Bioinformatics,2004

2. Controlling the false discovery rate: a practical and powerful approach to multiple testing;Benjamini;J. R. Stat. Soc. B Methodol,1995

3. Bioequivalence trials, intersection-union tests and equivalence confidence sets;Berger;Stat. Sci,1996

4. Omics data reveal putative regulators of einkorn grain protein composition under sulphur deficiency;Bonnot;Plant Physiol,2020

5. Multivariate Kernel Smoothing and its Applications

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3