Persistent memory as an effective alternative to random access memory in metagenome assembly-Reference-Cited by-同舟云学术

Persistent memory as an effective alternative to random access memory in metagenome assembly

Published:2022-11-30 Issue:1 Volume:23 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Sun Jingchao,Qiu Zhining,Egan Rob,Ho Harrison^ORCID,Li Yue,Wang Zhong^ORCID

Abstract

Abstract Background The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures. Results In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment. Conclusions We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.

Funder

Biological and Environmental Research

National Science Foundation Research Training Program

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-022-05052-8.pdf

Reference24 articles.

1. Aerospike. Building real-time database at petabyte scale. 2019. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-persistent-memory-database-restart-demo.html.

2. Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform. 2020;21(2):584–94.

3. Boisvert S, Raymond F, Godzaridis É, Laviolette F, Corbeil J. Ray meta: scalable de novo metagenome assembly and profiling. Genome Biol. 2012;13(12):1–13.

4. Brian Couger M, Pipes L, Squina F, Prade R, Siepel A, Palermo R, Katze MG, Mason CE, Blood PD. Enabling large-scale next-generation sequence assembly with blacklight. Concurr Comput: Pract Exp. 2014;26(13):2157–66.

5. Hess M, Sczyrba A, Egan R, Kim T-W, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science. 2011;331(6016):463–7.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The complete chloroplast genome sequence of Amorphophallus konjac (Araceae) from Yunnan, China and its phylogenetic analysis in the family Araceae;Mitochondrial DNA Part B;2024-01-02

2. Novel bacteriophage-mediated β-lactamase-encoding genes and their risk assessment in environmental communities;Process Safety and Environmental Protection;2023-05