A Complexity Reduction Algorithm for Analysis and Annotation of Large Genomic Sequences-Reference-Cited by-同舟云学术

A Complexity Reduction Algorithm for Analysis and Annotation of Large Genomic Sequences

Published:2003-02-01 Issue:2 Volume:13 Page:313-322
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Chuang Trees-Juen,Lin Wen-Chang,Lee Hurng-Chun,Wang Chi-Wei,Hsiao Keh-Lin,Wang Zi-Hao,Shieh Danny,Lin Simon C.,Ch'ang Lan-Yang

Abstract

DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.[Supplementary material is available online at http://www.genome.organdhttp://crasa.sinica.edu.tw/bioinformatics/Supplementary.htm.]

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference42 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

2. GAIA: Framework Annotation of Genomic Sequence

3. Using GeneWise in the Drosophila Annotation Experiment

4. Prediction of complete gene structures in human genomic DNA

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genome Annotation;Encyclopedia of Bioinformatics and Computational Biology;2019

2. Applications of Supercomputers in Sequence Analysis and Genome Annotation;Biotechnology;2019

3. An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm;BMC Bioinformatics;2017-10-24

4. Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs;International Journal of Applied Metaheuristic Computing;2012-01

5. Using Bioinformatics Techniques for Gene Identification in Drug Discovery and Development;Current Drug Metabolism;2008-07-01