Author:
King Oliver D.,Foulger Rebecca E.,Dwight Selina S.,White James V.,Roth Frederick P.
Abstract
The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases. This allows the prediction of gene function based on patterns of annotation. For example, if annotations for two attributes tend to occur together in a database, then a gene holding one attribute is likely to hold the other as well. We modeled the relationships among GO attributes with decision trees and Bayesian networks, using the annotations in theSaccharomyces Genome Database (SGD) and in FlyBase as training data. We tested the models using cross-validation, and we manually assessed 100 gene–attribute associations that were predicted by the models but that were not present in the SGD or FlyBase databases. Of the 100 manually assessed associations, 41 were judged to be true, and another 42 were judged to be plausible.[Detailed lists of hypotheses including the curators' comments on each hypothesis, are available at http://llama.med.harvard.edu/∼king/predictions.html.]
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics(clinical),Genetics
Reference29 articles.
1. InterPro: An integrated documentation resource for protein families, domains and functional sites.;Apweiler;Bioinfomatics,2000
2. The Mouse Genome Database (MGD): the model organism database for the laboratory mouse
3. Breese J. Heckerman D. Kadie C. (1998) Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, eds Cooper G.F. Moral S. (Morgan Kaufman, San Francisco, CA), pp 43â52.
4. Breiman L. Friedman J.H. Olsen R.A. Stone C.J. (1984) Classification and regression trees. (Chapman & Hall, New York, NY).
5. SGD: Saccharomyces Genome Database
Cited by
98 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献