Abstract
AbstractMolecular/genetic methods are becoming increasingly important for surveillance of diseases like malaria. Such methods allow to monitor routes of disease transmission or the origin and spread of variants associated with drug resistance. A confounding factor in molecular disease surveillance is the presence of multiple distinct variants in the same infection (multiplicity of infection – MOI), which leads to ambiguity when reconstructing which pathogenic variants are present in an infection. Heuristic approaches often ignore ambiguous infections, which leads to biased results. To avoid such bias, we introduce a statistical framework to estimate haplotype frequencies alongside MOI from a pair of multi-allelic molecular markers. Estimates are based on maximum-likelihood using the expectation-maximization (EM)-algorithm. The estimates can be used as plug-ins to construct pairwise linkage disequilibrium (LD) maps. The finite-sample properties of the proposed method are studied by systematic numerical simulations. These reveal that the EM-algorithm is a numerically stable method in our case and that the proposed method is accurate (little bias) and precise (small variance) for a reasonable sample size. In fact, the results suggest that the estimator is asymptotically unbiased. Furthermore, the method is appropriate to estimate LD (byD′, r2,Q*, or conditional asymmetric LD). Furthermore, as an illustration, we apply the new method to a previously-published dataset from Cameroon concerning sulfadoxine-pyrimethamine (SP) resistance. The results are in accordance with the SP drug pressure at the time and the observed spread of resistance in the country, yielding further evidence for the adequacy of the proposed method. The method is particularly useful for deriving LD maps from data with many ambiguous observations due to MOI. Importantly, the method per se is not restricted to malaria, but applicable to any disease with a similar transmission pattern. The method and several extensions are implemented in an easy-to-use R script.Author summaryAdvances in genetics render molecular disease surveillance increasingly popular. Unlike traditional incidence-based epidemiological data, genetic information provides fine-grained resolution, which allows monitoring and reconstructing routes of transmission, the spread of drug resistance, etc. Molecular surveillance is particularly popular in highly relevant diseases such as malaria. The presence of multiple distinct pathogenic variants within one infection, i.e., multiplicity of infection (MOI), is a confounding factor hampering the analysis of molecular data in the context of disease surveillance. Namely, due to MOI ambiguity concerning the pathogenic variants being present in mixed-clone infections arise. These are often disregarded by heuristic approaches to molecular disease surveillance and lead to biased results. To avoid such bias we introduce a method to estimate the distribution of MOI and frequencies of pathogenic variants based on a concise probabilistic model. The method is designed for two multi-allelic genetic markers, which is the appropriate genetic architecture to derive pairwise linkage-disequilibrium maps, which are informative on population structure or evolutionary processes, such as the spread of drug resistance. We validate the appropriateness of our method by numerical simulations and apply it to a malaria dataset from Cameroon, concerning sulfadoxine-pyrimethamine resistance, the drug used for intermittent preventive treatment during pregnancy.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献