Abstract
AbstractWe present the “Evolutionary Statistics Toolkit”, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima’s D calculator with Site Frequency Spectrum (SFS), Shannon’s Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, and kurtosis coefficient calculator. Tajima’s D is calculated using the reference formula: D = (π - θW) / sqrt(VD), where π corresponds to the average number of differences, θWis Watterson’s estimator of θ, and VDis the variance of π - θW. Shannon’s Entropy is defined as H = -∑ pi* log2(pi), where piis the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on:https://www.alperkaragol.com/toolkit
Publisher
Cold Spring Harbor Laboratory