Abstract
Background
Molecular surveillance of infectious diseases allows the monitoring of pathogens beyond the granularity of traditional epidemiological approaches and is well-established for some of the most relevant infectious diseases such as malaria. The presence of genetically distinct pathogenic variants within an infection, referred to as multiplicity of infection (MOI) or complexity of infection (COI) is common in malaria and similar infectious diseases. It is an important metric that scales with transmission intensities, potentially affects the clinical pathogenesis, and a confounding factor when monitoring the frequency and prevalence of pathogenic variants. Several statistical methods exist to estimate MOI and the frequency distribution of pathogen variants. However, a common problem is the quality of the underlying molecular data. If molecular assays fail not randomly, it is likely to underestimate MOI and the prevalence of pathogen variants.
Methods and findings
A statistical model is introduced, which explicitly addresses data quality, by assuming a probability by which a pathogen variant remains undetected in a molecular assay. This is different from the assumption of missing at random, for which a molecular assay either performs perfectly or fails completely. The method is applicable to a single molecular marker and allows to estimate allele-frequency spectra, the distribution of MOI, and the probability of variants to remain undetected (incomplete information). Based on the statistical model, expressions for the prevalence of pathogen variants are derived and differences between frequency and prevalence are discussed. The usual desirable asymptotic properties of the maximum-likelihood estimator (MLE) are established by rewriting the model into an exponential family. The MLE has promising finite sample properties in terms of bias and variance. The covariance matrix of the estimator is close to the Cramér-Rao lower bound (inverse Fisher information). Importantly, the estimator’s variance is larger than that of a similar method which disregards incomplete information, but its bias is smaller.
Conclusions
Although the model introduced here has convenient properties, in terms of the mean squared error it does not outperform a simple standard method that neglects missing information. Thus, the new method is recommendable only for data sets in which the molecular assays produced poor-quality results. This will be particularly true if the model is extended to accommodate information from multiple molecular markers at the same time, and incomplete information at one or more markers leads to a strong depletion of sample size.
Funder
Deutscher Akademischer Austauschdienst
Sächsisches Staatsministerium für Wissenschaft und Kunst
Bundesministerium für Bildung und Forschung
Deutsche Forschungsgemeinschaft
Publisher
Public Library of Science (PLoS)
Reference22 articles.
1. Organization WH. Global genomic surveillance strategy for pathogens with pandemic and epidemic potential, 2022–2032. World Health Organization; 2022.
2. Evolutionary genetics of malaria;KA Schneider;Frontiers in Genetics,2022
3. Bias-corrected maximum-likelihood estimation of multiplicity of infection and lineage frequencies;M Hashemi;PLOS ONE,2022
4. Large and finite sample properties of a maximum-likelihood estimator for multiplicity of infection;KA Schneider;PLOS ONE,2018
5. The many definitions of multiplicity of infection;KA Schneider;Frontiers in Epidemiology,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献