Evolving hard problems: Generating human genetics datasets with a complex etiology-Reference-Cited by-同舟云学术

Evolving hard problems: Generating human genetics datasets with a complex etiology

Published:2011-07-07 Issue:1 Volume:4 Page:
ISSN:1756-0381
Container-title:BioData Mining
language:en
Short-container-title:BioData Mining

Author:

Himmelstein Daniel S,Greene Casey S,Moore Jason H

Abstract

Abstract Background A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Results Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. Conclusions This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Genetics,Molecular Biology,Biochemistry

Link

http://link.springer.com/content/pdf/10.1186/1756-0381-4-21.pdf

Reference38 articles.

1. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, Brooks LD, Cardon LR, Daly M, Donnelly P, Fraumeni JF, Freimer NB, Gerhard DS, Gunter C, Guttmacher AE, Guyer MS, Harris EL, Hoh J, Hoover R, Kong CA, Merikangas KR, Morton CC, Palmer LJ, Phimister EG, Rice JP, Roberts J, Rotimi C, Tucker MA, Vogan KJ, Wacholder S, Wijsman EM, Winn DM, Collins FS: Replicating genotype-phenotype associations. Nature. 2007, 447 (7145): 655-60. 10.1038/447655a.

2. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008, 9 (5): 356-369. 10.1038/nrg2344.

3. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K: A comprehensive review of genetic association studies. Genet Med. 2002, 4: 45-61. 10.1097/00125817-200203000-00002.

4. Shriner D, Vaughan LK, Padilla MA, Tiwari HK: Problems with Genome-Wide Association Studies. Science. 2007, 316 (5833): 1840-1841.

5. Williams SM, Canter JA, Crawford DC, Moore JH, Ritchie MD, Haines JL: Problems with Genome-Wide Association Studies. Science. 2007, 316 (5833): 1841-1842.

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions;Complex & Intelligent Systems;2022-07-27

2. ELSSI: parallel SNP–SNP interactions detection by ensemble multi-type detectors;Briefings in Bioinformatics;2022-06-14

3. Multipopulation harmony search algorithm for the detection of high-order SNP interactions;Bioinformatics;2020-03-30

4. Evolving controllably difficult datasets for clustering;Proceedings of the Genetic and Evolutionary Computation Conference;2019-07-13

5. Fungal community composition analysis of 24 different urban parks in Shanghai, China;Urban Ecosystems;2019-05-18