Abstract
AbstractGenomic data repositories like The Cancer Genome Atlas (TCGA), Encyclopedia of DNA Elements (ENCODE), Bioconductor’s AnnotationHub and ExperimentHub etc., provide public access to large amounts of genomic data as flat files. Researchers often download a subset of files data from these repositories to perform their data analysis. As these data repositories become larger, researchers often face bottlenecks in their exploratory data analysis. Based on the concepts of a NoDB paradigm, we developed epivizFileServer, a Python library that implements an in-situ data query system for local or remotely hosted indexed genomic files, not only for visualization but also data manipulation. The File Server library decouples data from analysis workflows and provides an abstract interface to define computations independent of the location, format or structure of the file.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update
2. Alagiannis, Ioannis et al. (2012). “NoDB: Efficient Query Execution on Raw Data Files”. In: p. 8.
3. Analysis of ChIP-seq data. URL: https://galaxyproject.org/tutorials/chip/ (visited on 12/01/2019).
4. The NIH Roadmap Epigenomics Mapping Consortium