Analysis of Heterogeneous Genomic Samples Using Image Normalization and Machine Learning

Author:

Basodi Sunitha,Baykal Pelin Icer,Zelikovsky Alex,Skums Pavel,Pan Yi

Abstract

AbstractBackgroundAnalysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for the analysis of sequencing data associated with such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datastes of different sizes and structures.MethodsWe propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important molecular epidemiology problems: inference of viral infection stage and detection of viral transmission clusters and outbreaks using next-generation sequencing data.ResultsThe infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy.AvailabilityThe developed software is freely available at https://bitbucket.org/adv_bio_coll/chronic_vs_clinic

Publisher

Cold Spring Harbor Laboratory

Reference37 articles.

1. Transmission of hepatitis c virus associated with surgical procedures-new jersey 2010 and wisconsin 2011;MMWR. Morbidity and mortality weekly report,2015

2. Irina Astrovskaya , Nicholas Mancuso , Bassam Tork , Serghei Mangul , Alex Artyomenko , Pavel Skums , Lilia Ganova-Raeva , Ion Măndoiu , Alex Zelikovsky , and MD Park . Inferring viral quasispecies spectra from shortgun and aplicon next-generation sequencing reads. Genome analysis: current procedures and applications, 2014.

3. A molecular transmission network of recent hepatitis c infection in people with and without hiv: Implications for targeted treatment strategies;Journal of viral hepatitis,2017

4. Pelin Icer Baykal , Alexander Artyomenko , Sumathi Ramachandran , Yury Khudyakov , Alex Zelikovsky , and Pavel Skums . Assessment of hcv infection stage as recent or chronic using multi-parameter analysis and machine learning. In 2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pages 1–1. IEEE, 2017.

5. Computational methods for the design of effective therapies against drug resistant HIV strains

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3