Machine learning-based approach KEVOLVE efficiently identifies SARS-CoV-2 variant-specific genomic signatures

Author:

Lebatteux Dylan,Soudeyns HugoORCID,Boucoiran IsabelleORCID,Gantt SorenORCID,Diallo Abdoulaye Baniré

Abstract

AbstractMachine learning has proven to be a powerful tool for the identification of distinctive genomic signatures among viral sequences. Such signatures are motifs present in the viral genome that differentiate species or variants. In the context of SARS-CoV-2, the identification of such signatures can contribute to taxonomic and phylogenetic studies, help in recognizing and defining distinct emerging variants, and focus the characterization of functional properties of polymorphic gene products. Here, we study KEVOLVE, an approach based on a genetic algorithm with a machine learning kernel, to identify several genomic signatures based on minimal sets of k-mers. In a comparative study, in which we analyzed large SARS-CoV-2 genome dataset, KEVOLVE performed better in identifying variant-discriminative signatures than several gold-standard reference statistical tools. Subsequently, these signatures were characterized to highlight potential biological functions. The majority were associated with known mutations among the different variants, with respect to functional and pathological impact based on available literature. Notably, we found show evidence of new motifs, specifically in the Omicron variant, some of which include silent mutations, indicating potentially novel, variant-specific virulence determinants. The source code of the method and additional resources are available at: https://github.com/bioinfoUQAM/KEVOLVE.Author summaryAdvances in cloning and sequencing technologies have yielded a vast repository of viral genomic sequence data. To analyze this complex and massive data, Machine learning, which refers to the development and application of computer algorithms that improve with experience, has proven to be efficient. Although many methods have been developed to classify viruses into different characteristic groups, it is often difficult to explain the predictions of these methods. To overcome this, we are working in our laboratory on the design of machine learning based methods for discriminative signatures identification within viral genomic sequences. These signatures which are a specific motifs to groups of viruses known to be pervasive in their genome, are used to 1) build accurate and explainable prediction tools for pathogens and 2) highlight mutations potentially associated with functional changes. In this paper we present the potential of our latest approach KEVOLVE. We first compare it to three discriminating motif identification tools with data sets covering several SARS-CoV-2 variant genomes. We then focus on the identified motifs by KEVOLVE to analyze the mutations associated with the different variants and the potential changes in biological functions that they may involve.

Publisher

Cold Spring Harbor Laboratory

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3