Abstract
AbstractThe volume of public nucleotide sequence data has blossomed over the past two decades, enabling novel discoveries via re-analysis, meta-analyses, and comparative studies for uncovering general biological trends. However, reproducible re-use and management of sequence datasets remains a challenge. We created the software plugin q2-fondue to enable user-friendly acquisition, re-use, and management of public nucleotide sequence (meta)data while adhering to open data principles. The software allows fully provenance-tracked programmatic access to and management of data from the Sequence Read Archive (SRA). Sequence data and accompanying metadata retrieved with q2-fondue follow a validated format, which is interoperable with the QIIME 2 ecosystem and its multiple user interfaces. To highlight the manifold capabilities of q2-fondue, we present several demonstration analyses using amplicon, whole genome, and shotgun metagenome datasets. These use cases demonstrate how q2-fondue increases analysis reproducibility and transparency from data download to final visualizations by including source details in the integrated provenance graph. We believe q2-fondue will lower existing barriers to comparative analyses of nucleotide sequence data, enabling more transparent, open, and reproducible conduct of meta-analyses. q2-fondue is a Python 3 package released under the BSD 3-clause license at https://github.com/bokulich-lab/q2-fondue.
Publisher
Cold Spring Harbor Laboratory