Abstract
AbstractViral genetic information from people living with HIV can deepen our understanding of the infection’s epidemiology at many scales. To better understand the potentials and limits of tools that utilise such information, we show the performance of two representative tools (HIV-TRACEandphyloscanner) in describing HIV transmission dynamics, with different types of genetic data, and compare with previous findings. The samples were collected from three cohort studies in Sub-Saharan Africa and were deep sequenced to produce both short Illumina reads and long PacBio reads. By comparingphyloscanner’s performance with short and long reads, we show that long reads provide improved phylogenetic resolution for the classic transmission topology in joint within-host trees. Our pipeline accurately predicted the direction of transmission 88%-92% of the time. We also show that the timing of sample collection plays an important role in the reconstruction of directionality using deep sequencing data. Consensus sequences were also generated and used asHIV-TRACEinput to show different patterns of clustering sensitivity and specificity for data from different genomic regions or the entire genome. Finally, we discuss adjusting expectations about sensitivity and specificity of different types of sequence data, considering rapid pathogen evolution, and highlight the potentials of high within-host phylogenetic resolution in HIV. In conclusion, viral genetic data collected and presented differently could greatly influence our ability to describe the underlying dynamics. Methods for source attribution analysis have reached levels of superior accuracy. However, residual uncertainty emphasizes sequence analysis alone cannot conclusively prove linkage at the individual level.ImportanceUnderstanding HIV transmission dynamics is key to designing effective HIV testing and prevention strategies. By using different sequencing techniques on well-characterised cohorts, we were able to evaluate the effect of genetic data resolution on the accuracy of identifying likely transmission pairs and the direction of transmission within pairs. We find that the longer reads generated by PacBio sequencing are more suitable for transmission analyses.
Publisher
Cold Spring Harbor Laboratory