Author:
Sonnenburg Sören,Schweikert Gabriele,Philips Petra,Behr Jonas,Rätsch Gunnar
Abstract
Abstract
Background
For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.
Results
In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.
Availability
Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference64 articles.
1. Bajic V, Brent M, Brown R, Frankish A, Harrow J, Ohler U, Solovyev V, Tan S: Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biology 2006,7(Suppl 1):S3-.
2. Stein L, Blasiar D, Coghlan A, Fiedler T, McKay S, Flicek P: nGASP Gene prediction challenge.2007. [http://www.wormbase.org/wiki/index.php/NGASP]
3. Rätsch G, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer R, Schölkopf B: Improving the C. elegans genome annotation using machine learning. PLoS Computational Biology 2007,3(2):e20.
4. Bernal A, Crammer K, Hatzigeorgiou A, Pereira F: Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction. PLoS Computational Biology 2007,3(3):e54.
5. Hinds D, Stuve L, Nilsen G, Halperin E, Eskin E, Ballinger D, Frazer K, Cox D: Whole-Genome Patterns of Common DNA Variation in Three Human Populations. Science 2005,307(5712):1072–1079.
Cited by
135 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献