The parallelism motifs of genomic data analysis

Author:

Yelick Katherine12ORCID,Buluç Aydın12,Awan Muaaz1,Azad Ariful3,Brock Benjamin12,Egan Rob4,Ekanayake Saliya1,Ellis Marquita12,Georganas Evangelos5,Guidi Giulia12,Hofmeyr Steven1,Selvitopi Oguz1,Teodoropol Cristina12,Oliker Leonid1

Affiliation:

1. Lawrence Berkeley National Laboratory, Berkeley, CA, USA

2. Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA

3. School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA

4. DOE Joint Genome Institute, Walnut Creek, CA, USA

5. Intel Labs, Santa Clara, CA, USA

Abstract

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Funder

Department of Energy Office of Science

National Science Foundation

Publisher

The Royal Society

Subject

General Physics and Astronomy,General Engineering,General Mathematics

Cited by 15 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DNA Pattern Matching Algorithms within Sorghum bicolor Genome: A Comparative Study;2024 7th International Conference on Informatics and Computational Sciences (ICICoS);2024-07-17

2. Toward a Predictive Understanding of Cyanobacterial Harmful Algal Blooms through AI Integration of Physical, Chemical, and Biological Data;ACS ES&T Water;2023-11-30

3. Opportunities and challenges of 5G network technology toward precision medicine;Clinical and Translational Science;2023-09-25

4. A general approach for supporting nonblocking data structures on distributed-memory systems;Journal of Parallel and Distributed Computing;2023-03

5. Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3