Perspectives: sequence data base searching in the era of large-scale genomic sequencing.-Reference-Cited by-同舟云学术

Perspectives: sequence data base searching in the era of large-scale genomic sequencing.

Published:1996-08 Issue:8 Volume:6 Page:653-660
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Smith R F

Abstract

Large-scale sequencing of human and model organism genomes will have a profound impact on our ability to use sequence data base searching to predict the biochemical functions of sequences of interest. Despite the great value of more sequences in the data bases, a huge increase in data base size will also have adverse effects on data base searches. Upcoming problems will include (1) greatly increased search times, (2) an increase in background noise of high-scoring but biologically irrelevant matches, (3) inaccurate coding region prediction, leading to problems in protein data base searching, and (4) limited first-pass sequence annotation, making it difficult to determine the biological relevance of data base hits. Improved data base annotation tools and construction of smaller data bases of representative and highly-annotated sequences for first-pass analyses will be essential to deal with the impending flood of new genomic sequence.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference43 articles.

1. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.;Nature (Suppl.),1995

2. Issues in searching molecular sequence databases

3. Progress with the PRINTS protein fingerprint database

4. The SWISS-PROT protein sequence data bank and its new supplement TREMBL

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Systematic characterization of hypothetical proteins in Synechocystis sp. PCC 6803 reveals proteins functionally relevant to stress responses;Gene;2013-01

2. The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species;PLoS Computational Biology;2009-07-03

3. Limitations and Pitfalls in Protein Identification by Mass Spectrometry;Chemical Reviews;2007-07-24

4. Mining sequence annotation databanks for association patterns;Bioinformatics;2005-11-01

5. Automated generation of heuristics for biological sequence comparison;BMC Bioinformatics;2005