Straintables: An application that extracts sequences from genome assemblies and generates dissimilarity matrices

Author:

Araujo Gabriel NogueiraORCID,Francis Richard WORCID,Ferreira Cristina dos SantosORCID,Rangel Alba Lucínia PeixotoORCID

Abstract

AbstractBackground and ObjectivesThe dissimilarity matrix (DM) is an important component of phylogenetic analysis, and many software packages exist to build and show DMs. However, as the common input for this type of software are sequences in FASTA file format, the process of extracting and aligning each set of sequences to produce a big number of matrices can be laborious. Additionally, existing software do not facilitate the comparison of clusters of similarity across several DMs built for the same group of individuals, using different genomic regions. To address our requirements of such a tool, we designed Straintables to extract specific genomic region sequences from a group of intraspecies genomic assemblies, using extracted sequences to build dissimilarity matrices.MethodsA Python module with executable scripts was developed for a study on genetic diversity across strains of Toxoplasma gondii, being a general purpose system for DM calculation and visualization for preliminary phylogenetic studies. For automatic region sequence extraction from genomic assemblies we assembled a system that designs virtual primers using reference sequences located at genomic annotations, then matches those primers on genome files by using regex patterns. Extracted sequences are then aligned using Clustal Omega and compared to generate matrices.ResultsUsing this software saves the user from manual preparation and alignment of the sequences, a process that can be laborious when a large number of assemblies or regions are involved. The automatic sequence extraction process can be checked against BLAST results using extracted sequence as queries, where correct results were observed for same-species pools for various organisms. The package also contains a matrix visualization tool focused on cluster visualization, capable of drawing matrices into image files with custom settings, and features methods of reordering matrices to facilitate the comparison of clustering patterns across two or more matrices.ConclusionStraintables may replace and extend the functionality of existing matrix-oriented phylogenetic software, featuring automatic region extraction from genomic assemblies and enhanced matrix visualization capabilities emphasizing cluster identification. This module is open source, available at GitHub (https://github.com/Gab0/straintables) under a MIT license and also as a PIPY package.HighlightsSimple in-silico protocol for generation, visualization and comparison of dissimilarity matrices.Accurate automatic sequence extraction from multiple genomic assemblies by using virtual primers built from reference sequences in an annotation file.Draws matrices as images, with enhanced cluster visualization and customized options.Supports reordering of matrix indices to better visualize clustering pattern conservation across multiple regions.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3