Abstract
BackgroundDNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from many GenBank accessions based on annotations where gene names may vary extensively.ResultsThe R packageAnnotationBustRallows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers.AnnotationBustRextracts subsequences of interest and then writes them to a FASTA file for users to employ in their research endeavors.ConclusionFASTA files of extracted subsequences and accession tables generated byAnnotationBustRallow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.
Funder
University of Tennessee, Knoxville
Subject
General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献