Author:
Basodi Sunitha,Baykal Pelin Icer,Zelikovsky Alex,Skums Pavel,Pan Yi
Abstract
AbstractBackgroundAnalysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for the analysis of sequencing data associated with such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datastes of different sizes and structures.MethodsWe propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important molecular epidemiology problems: inference of viral infection stage and detection of viral transmission clusters and outbreaks using next-generation sequencing data.ResultsThe infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy.AvailabilityThe developed software is freely available at https://bitbucket.org/adv_bio_coll/chronic_vs_clinic
Publisher
Cold Spring Harbor Laboratory
Reference37 articles.
1. Transmission of hepatitis c virus associated with surgical procedures-new jersey 2010 and wisconsin 2011;MMWR. Morbidity and mortality weekly report,2015
2. Irina Astrovskaya , Nicholas Mancuso , Bassam Tork , Serghei Mangul , Alex Artyomenko , Pavel Skums , Lilia Ganova-Raeva , Ion Măndoiu , Alex Zelikovsky , and MD Park . Inferring viral quasispecies spectra from shortgun and aplicon next-generation sequencing reads. Genome analysis: current procedures and applications, 2014.
3. A molecular transmission network of recent hepatitis c infection in people with and without hiv: Implications for targeted treatment strategies;Journal of viral hepatitis,2017
4. Pelin Icer Baykal , Alexander Artyomenko , Sumathi Ramachandran , Yury Khudyakov , Alex Zelikovsky , and Pavel Skums . Assessment of hcv infection stage as recent or chronic using multi-parameter analysis and machine learning. In 2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pages 1–1. IEEE, 2017.
5. Computational methods for the design of effective therapies against drug resistant HIV strains
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献