SeqLengthPlot: An easy-to-use Python-based Tool for Visualizing and Retrieving Sequence Lengths from fasta files with a Tunable Splitting Point-Reference-Cited by-同舟云学术

SeqLengthPlot: An easy-to-use Python-based Tool for Visualizing and Retrieving Sequence Lengths from fasta files with a Tunable Splitting Point

Published:2024-06-09 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Domínguez-Pérez Dany^ORCID,Agüero-Chapin Guillermin^ORCID,Leone Serena^ORCID,Modica Maria Vittoria^ORCID

Abstract

AbstractMotivationAccurate sequence length profiling is essential in bioinformatics, particularly in genomics and proteomics. Existing tools like SeqKit and the Trinity toolkit, among others provide basic sequence statistics but often fall short in offering comprehensive analytics and plotting options. For instance, SeqKit is a very complete and fast tool for sequence analyses, that delivers useful metrics (e.g., number of sequences, average, minimum, maximum length), and can returns the range of sequence shorter or longer (one side, not both at once) on a given lengths. Similarly, Trinity’s utility pearl-based scripts provide detailed contig length distributions (e.g., N50, median, and average lengths) but do not encompass the total number of sequences nor offer graphical representations of data.ResultsGiven that key sequence analysis tasks are distributed among separate tools, we introduce SeqLengthPlot: an easy-to-use Python-based script that fills existing gaps in bioinformatics tools on sequence length profiling, crucial. SeqLengthPlot generates comprehensive statistical summaries, filtering and automatic sequences retriving from the input FASTA (nucleotide and proteins) file into two distinct files based on a tunable, user-defined sequence length, as well as the plots or dynamic visualizations of the corresponding sequences.Availability and implementationThe detailed SeqLengthPlot pipeline is available on GitHub athttps://github.com/danydguezperez/SeqLengthPlot, released under the GPL-3.0 license. Additional datasets used as sources or compiled as use cases are publicy available through the Mendeley Data repository:DATASET_Ss_SE.1:http://dx.doi.org/10.17632/pmxwfjyyvy.1,DATASET_Ss_SE.2:http://dx.doi.org/10.17632/3rtbr7c9s8.1,DATASET_Ss_SE.3:http://dx.doi.org/10.17632/wn5kbk5ryy.1,DATASET_Ss_SE.4:http://dx.doi.org/10.17632/sh79mdcm2c.1andDATASET_Ss_SE.5:http://dx.doi.org/10.17632/zmvvff35dx.1.

Publisher

Cold Spring Harbor Laboratory

Reference27 articles.

1. Emerging Computational Approaches for Antimicrobial Peptide Discovery;Antibiotics,2022

2. Agüero-Chapin, G. et al. (2023) Unveiling Encrypted Antimicrobial Peptides from Cephalopods’ Salivary Glands: A Proteolysis-Driven Virtual Approach.

3. Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus;Data,2020

4. Putative Antimicrobial Peptides of the Posterior Salivary Glands from the Cephalopod Octopus vulgaris Revealed by Exploring a Composite Protein Database;Antibiotics,2020

5. Evolutionary Analysis of Cnidaria Small Cysteine-Rich Proteins (SCRiPs), an Enigmatic Neurotoxin Family from Stony Corals and Sea Anemones (Anthozoa: Hexacorallia)