An optimized procedure greatly improves EST vector contamination removal-Reference-Cited by-同舟云学术

An optimized procedure greatly improves EST vector contamination removal

Published:2007-11-13 Issue:1 Volume:8 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Chen Yi-An,Lin Chang-Chun,Wang Chin-Di,Wu Huan-Bin,Hwang Pei-Ing

Abstract

Abstract Background The enormous amount of sequence data available in the public domain database has been a gold mine for researchers exploring various themes in life sciences, and hence the quality of such data is of serious concern to researchers. Removal of vector contamination is one of the most significant operations to obtain accurate sequence data containing only a cDNA insert from the basecalls output by an automatic DNA sequencer. Popular bioinformatics programs to accomplish vector trimming include LUCY, cross_match and SeqClean. Results In a recent study, where the program SeqClean was used to remove vector contamination from our test set of EST data compiled through various library construction systems, however, a significant number of errors remained after preliminary trimming. These errors were later almost completely corrected by simply using a re-linearized form of the cloning vector to compare against the target ESTs. The modified trimming procedure for SeqClean was also compared with the trimming efficiency of the other two popular programs, LUCY2, and cross_match. Using SeqClean with a re-linearized form of the cloning vector significantly surpassed the other two programs in all tested conditions, while the performance of the other two programs was not influenced by the modified procedure. Vector contamination in dbEST was also investigated in this study: 2203 out of the 48212 ESTs sampled from dbEST (2007-04-18 freeze) were found to match sequences in UNIVEC. Conclusion Vector contamination remains a serious concern to the data quality in the public sequence database nowadays. Based on the results presented here, we feel that our modified procedure with SeqClean should be recommended to all researchers for the task of vector removal from EST or genomic sequences.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/1471-2164-8-416.pdf

Reference29 articles.

1. Bork P, Bairoch A: Go hunting in sequence databases but watch out for the traps. Trends Genet. 1996, 12: 425-427. 10.1016/0168-9525(96)60040-7.

2. Colleagues CTGoBMa: Quality control in databanks for molecular biology. Bioessays. 2000, 22 (11): 1024-1034. 10.1002/1521-1878(200011)22:11<1024::AID-BIES9>3.0.CO;2-W.

3. Seluja GA, Farmer A, McLeod M, Harger C, Schad PA: Establishing a method of vector contamination identification in database sequences. Bioinformatics. 1999, 15 (2): 106-110. 10.1093/bioinformatics/15.2.106.

4. Lamperti ED, Kittelberger JM, Smith TF, Villa-Komaroff L: Corruption of genomic databases with anomalous sequence. Nucleic Acids Res. 1992, 20 (11): 2741-2747. 10.1093/nar/20.11.2741.

5. Korning PG, Hebsgaard SM, Rouze P, Brunak S: Cleaning theGenBank Arabidopsis thaliana data set. Nucleic Acids Res. 1996, 24 (2): 316-320. 10.1093/nar/24.2.316.

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unravelling transcriptional responses of the willow to Fusarium kuroshium infection;Physiological and Molecular Plant Pathology;2024-09

2. Population genomic analysis reveals genetic structure and thermal-tolerant genotypes in remnant Tasmanian giant kelp populations;2023-10-10

3. Post-meiotic mechanism of facultative parthenogenesis in gonochoristic whiptail lizard species;2023-09-22

4. A haplotype‐resolved genome for Rhododendron × pulchrum and the expression analysis of heat shock genes;Journal of Systematics and Evolution;2023-07-21

5. A high-quality Bougainvillea genome provides new insights into evolutionary history and pigment biosynthetic pathways in the Caryophyllales;Horticulture Research;2023-06-13