Author:
Iwasaki Yuki,Abe Takashi,Wada Kennosuke,Wada Yoshiko,Ikemura Toshimichi
Abstract
Abstract
Background
Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes.
Results
While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. Additionally, BLSOM also provided information concerning the special genomic region possibly undergoing RNA modifications.
Conclusions
The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes, and it can complement phylogenetic methods based on sequence alignment.
Publisher
Springer Science and Business Media LLC
Subject
Microbiology (medical),Microbiology
Reference22 articles.
1. World Health Organization. Novel Coronavirus (2019-nCoV): situation report, 1. World Health Organization; 2020. https://apps.who.int/iris/handle/10665/330760.
2. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nat Rev Microbiol. 2020;19(3):141–54.
3. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Chall. 2017;1:33–46.
4. Kanaya S, Kinouchi T, Abe T, et al. Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM) - characterization of horizontally transferred genes with emphasis on the E coli O157 genome. Gene. 2001;276:89–99.
5. Abe T, Kanaya S, Kinouchi T, et al. Informatics for unveiling hidden genome signatures. Genome Res. 2003;13:693–702.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献