Affiliation:
1. Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi Arabia
Abstract
This study explores the accuracy and efficiency of multiple sequence alignment (MSA) programs, focusing on ClustalΩ, MAFFT, and MUSCLE in the context of genotyping SARS-CoV-2 for the Saudi population. Our results indicate that MAFFT outperforms the others, making it an ideal choice for large-scale genomic analyses. The comparative performance of MSAs assembled using MergeAlign demonstrates that MAFFT and MUSCLE consistently exhibit higher accuracy than ClustalΩ in both reference-based and consensus-based approaches. The evaluation of genotyping effectiveness reveals that the addition of a reference sequence, such as the SARS-CoV-2 Wuhan-Hu-1 isolate, does not significantly affect the alignment process, suggesting that using consensus sequences derived from individual MSA alignments may yield comparable genotyping outcomes. Investigating single-nucleotide polymorphisms (SNPs) and mutations highlights distinctive features of MSA programs. ClustalΩ and MAFFT show similar counts, while MUSCLE displays the highest SNP count. High-frequency SNP analysis identifies MAFFT as the most accurate MSA program, emphasizing its reliability. Comparisons between Saudi and global SARS-CoV-2 populations underscore regional genetic variations. Saudis exhibit consistently higher frequencies of high-frequency SNPs, attributed to genetic similarity within the population. Transmission dynamics analysis reveals a higher frequency of co-mutations in the Saudi dataset, suggesting shared evolutionary patterns. These findings emphasize the importance of considering regional diversity in genetic analyses.
Funder
Deanship of Scientific Research, Imam Mohammad Ibn Saud Islamic University, Saudi Arabia
Subject
Applied Mathematics,Modeling and Simulation,General Computer Science,Theoretical Computer Science
Reference26 articles.
1. An introduction to sequence similarity (“homology”) searching;Pearson;Curr. Protoc. Bioinform.,2013
2. Sievers, F., Wilm, A., Dineen, D., Gibson, T., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., and Söding, J. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol., 7.
3. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform;Katoh;Nucleic Acids Res.,2002
4. MUSCLE: Multiple sequence alignment with high accuracy and high throughput;Edgar;Nucleic Acids Res.,2004
5. Combining many multiple alignments in one improved alignment;Caprani;Bioinformatics,1999
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献