Streamlining remote nanopore data access with slow5curl

Author:

Wong Bonson123,Ferguson James M12ORCID,Do Jessica Y123,Gamaarachchi Hasindu123ORCID,Deveson Ira W124ORCID

Affiliation:

1. Genomics and Inherited Disease Program, Garvan Institute of Medical Research , Sydney, NSW 2010 , Australia

2. Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute , Sydney, NSW 2010 , Australia

3. School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052 , Australia

4. St Vincent’s Clinical School, Faculty of Medicine, University of New South Wales , Sydney, NSW 2052 , Australia

Abstract

Abstract Background As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis. Results Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis. Conclusions We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.

Funder

Australian Medical Research Futures Fund

Australian Research Council

Publisher

Oxford University Press (OUP)

Reference31 articles.

1. Nanopore sequencing technology, bioinformatics and applications;Wang;Nat Biotechnol,2021

2. Method of the year: long-read sequencing;Marx;Nat Methods,2023

3. Performance of neural network basecalling tools for Oxford Nanopore sequencing;Wick;Genome Biol,2019

4. Species-specific basecallers improve actual accuracy of nanopore sequencing in plants;Ferguson;Plant Methods,2022

5. De novo basecalling of m6A modifications at single molecule and single nucleotide resolution;Cruciani;Biorxiv,2023

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3