Author:
Wardeh Maya,Pilgrim Jacko,Hui Melody,Kotsiri Aurelia,Baylis Matthew,Blagrove Marcus SC
Abstract
ABSTRACTRoutes of virus transmission between hosts are key to understanding viral epidemiology. Different routes have large effects on viral ecology, and likelihood and rate of transmission. For example, respiratory and vector-borne viruses together encompass the majority of high-consequence animal and plant outbreaks. However, the specific transmission route(s) can take months to years to determine, undermining the efficiency of mitigation efforts. Here, we identify the vial features and evolutionary signatures which are predictive of viral transmission routes, and use them to identify potential routes for fully-sequenced viruses – we perform this for both viruses with no observed routes or viruses with potentially missing routes. This was achieved by compiling a dataset of 24,953 virus-host associations with 81 defined transmission routes, constructing a hierarchy of virus transmission encompassing those routes and 42 higher-order modes, and engineering 446 predictive features from three (virus, host, and network) perspectives. We integrated those data and features, to train 98 different ensembles of LightGBM classifiers, each incorporating five different class-balancing approaches. Using our trained ensembles, we demonstrated that all features contributed to the prediction for at least one of routes and/or modes of transmission, demonstrating the utility of our multi-perspective approach. Our approach achieved ROC-AUC=0.991, and F1-score=0.855 on average across all modelled transmission mechanisms; and was able to achieve high levels of predictive performance for high-consequence respiratory (ROC-AUC=0.990, and F1-score=0.864) and vector-borne transmission (ROC-AUC=0.997, and F1-score=0.921). Our work ranks the viral features in order of their contribution to prediction, per transmission route, and hence identifies the genomic evolutionary signatures associated with each route. Together with the more matured field of viral host-range prediction, our predictive framework could provide early insights into the potential for, and pattern of viral spread; facilitate rapid response with appropriate measures; and significantly triage the time-consuming investigations to confirm the likely routes of transmission. Moreover, the strength of our approach in high-consequence transmission routes showcases that our methodology has direct utility to pandemic preparedness.
Publisher
Cold Spring Harbor Laboratory