Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences

Author:

McGuffie Matthew J.ORCID,Barrick Jeffrey E.ORCID

Abstract

AbstractEngineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science.Author SummaryPlasmids are used in molecular biology and biotechnology for a wide variety of tasks such as cloning DNA, expressing recombinant proteins, and creating vaccines. One challenge in working with plasmids is that there has been a long, and often lost history of pieces of plasmids being copied and remixed by researchers to create new plasmids. Current databases used for annotating key genetic parts in plasmids are incomplete, especially with respect to cataloguing closely related versions of parts that can have very different characteristics. Some genetic part variants have arisen due to purposeful editing while others are the result of unplanned mutations and evolution. When a researcher finds differences between a database sequence and a genetic part in their newly constructed plasmid, it is often unclear how and when it arose and whether it will affect their experiments. We identified 217 genetic part variants that are either widespread or have likely arisen independently more than once on plasmids due to convergent evolution or engineering. We propose that these variants should be prioritized for inclusion in curated databases of engineered DNA sequences and for functional characterization to improve the reliability and reproducibility of science.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3