Author:
Olabode Abayomi S,Ng Garway T,Wade Kaitlyn E,Salnikov Mikhail,Dick David W,Poon Art FY
Abstract
AbstractA new abundance of full-length HIV-1 genome sequences provides an opportunity to revisit the standard model of HIV-1/M diversity that clusters genomes into largely non-recombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for SIV and other HIV-1 groups. Here we develop an unsupervised non-parametric clustering approach, which does not rely on predefined non-recombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (DSBM) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial GLM, P < 8 × 10−8), compared to other reference-free recombination detection programs (GARD, RDP4 and RDP5). Applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 25 as the optimal number of DSBM clusters, and used change point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and novel recombination hotspots in the HIV-1 genome, and evidence of inter-subtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative new framework for HIV-1 classification.
Publisher
Cold Spring Harbor Laboratory