PC-mer: An Ultra-fast memory-efficient tool for metagenomics profiling and classification

Author:

Akbari Rokn Abadi SaeedehORCID,Mohammadi AmirhosseinORCID,Koohi SomayyehORCID

Abstract

Features extraction methods, such as k-mer-based methods, have recently made up a significant role in classifying and analyzing approaches for metagenomics data. But, they are challenged by various bottlenecks, such as performance limitations, high memory consumption, and computational overhead. To deal with these challenges, we developed an innovative features extraction and sequence profiling method for DNA/RNA sequences, called PC-mer, taking advantage of the physicochemical properties of nucleotides. PC-mer in comparison with the k-mer profiling methods provides a considerable memory usage reduction by a factor of 2k while improving the metagenomics classification performance, for both machine learning-based and computational-based methods, at the various levels and also archives speedup more than 1000x for the training phase. Examining ML-based PC-mer on various datasets confirms that it can achieve 100% accuracy in classifying samples at the class, order, and family levels. Despite the k-mer-based classification methods, it also improves genus-level classification accuracy by more than 14% for shotgun dataset (i.e. achieves accuracy of 97.5%) and more than 5% for amplicon dataset (i.e. achieves accuracy of 98.6%). Due to these improvements, we provide two PC-mer-based tools, which can actually replace the popular k-mer-based tools: one for classifying and another for comparing metagenomics data.

Publisher

Public Library of Science (PLoS)

Reference27 articles.

1. A new profiling approach for DNA sequences based on the nucleotides’ physicochemical features for accurate analysis of SARS-CoV-2 genomes;S. Akbari Rokn Abadi;BMC Genomics,2023

2. HELIOS: High-speed sequence alignment in optics;E. Maleki;PLoS Comput Biol,2022

3. An automated ultra-fast, memory-efficient, and accurate method for viral genome classification;S. Akbari Rokn Abadi;J Biomed Inform,2023

4. WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs;S. Akbari Rokn Abadi;PLoS One,2022

5. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study;G. S. Randhawa;PLoS One,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3