Author:
Feng Qian,Tiedje Kathryn,Ruybal-Pesántez Shazia,Tonkin-Hill Gerry,Duffy Michael,Day Karen,Shim Heejung,Chan Yao-ban
Abstract
AbstractRecombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is of major interest for biologists. However, current methods for detecting recombinants only work for aligned sequences, often require a reference panel, and do not scale well to large datasets. Thus they are not suitable for the analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination.We introduce an algorithm to detect recombinant sequences from an unaligned dataset. Our approach can effectively handle thousands of sequences without the need of an alignment or a reference panel, offering a general tool suitable for the analysis of many different types of sequences. We demonstrate the effectiveness of our algorithm through extensive numerical simulations; in particular, it maintains its accuracy in the presence of insertions and deletions.We apply our algorithm to a dataset of 17,335 DBLα types in var genes from Ghana, enabling the comparison between recombinant and non-recombinant types for the first time. We observe that sequences belonging to the same ups type or DBLα subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones.Author summaryRecombination is a fundamental process in molecular evolution where two genes exchange genetic material, diversifying the genes. It is important to properly model this process when reconstructing evolutionary history, and to do so we need to be able to identify recombinant genes. In this manuscript, we develop a method for this which can be applied to scenarios where current methods often fail, such as where genes are very diverse.We specifically focus on detecting recombinants in the var genes of the malaria parasite Plasmodium falciparum. These genes influence the length and severity of malaria infection, and therefore their study is critical to the treatment and prevention of malaria. They are also highly diverse, primarily because of recombination. Our analysis of genes from a cross-sectional study in Ghana study show fundamental relations between the patterns and prevalence of recombination in these genes and other important biological categorisations.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献