Abstract
AbstractApplication of concepts from information theory have revealed new features of Single Nucleotide Polymorphism (SNP) organization.. These features lead to effective classifiers by which to distinguish genomic sequences of contrasting phenotypes; as in case/control cohorts.When applied to a disease/control database, a disease classifier results; a parallel analysis leads to the determination of a wellness classifier. The classifiers have non-intersecting loci, and each involves roughly 100 alleles.The effectiveness of this framework is illustrated by application to adult onset, type 2, diabetes (T2D), as represented in the Wellcome Trust ((WT) Case/Control database.Simultaneous use of the two classifiers on the WT database leads to successful prediction of disease versus wellness; to the extent that near certain genomic forecasting is achieved.This framework gives a resolution to the oft posed uncertainty: “Where is the missing heritability?”Application of both classifiers on two additional T2D databases produced informative consequences.A fully independent, compelling, confirmation of the present results is obtained by means of the machine learning algorithm, Random Forests.The analytical model presented here is generalizable to other diseases.One Sentence SummaryDiscovery of intrinsic chromosomal SNP organizations leads to near certain genomic disease prediction.
Publisher
Cold Spring Harbor Laboratory