A statistical reference-free algorithm subsumes and generalizes common genomic sequence analysis and uncovers novel biological regulation-Reference-Cited by-同舟云学术

A statistical reference-free algorithm subsumes and generalizes common genomic sequence analysis and uncovers novel biological regulation

Published:2022-06-27 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chaung Kaitlin^ORCID,Baharav Tavor Z.^ORCID,Henderson George,Wang Peter,Zheludev Ivan N.^ORCID,Salzman Julia^ORCID

Abstract

SummaryWe show that myriad, disparate mechanisms that diversify genomes and transcriptomes can be captured by a unifying principle: sample-dependent sequence variation. This variation occurs in both RNA and DNA and functions to regulate transcript expression and adaptation. Using this insight, we develop a novel highly efficient algorithm – NOMAD – that performs inference on raw reads without any genomic reference or sample metadata. NOMAD unifies data-scientifically driven discovery with previously unattainable speed and generality. Examples include SARS-CoV-2, humans, and non-model animals and plants with both bulk and single cell RNA-sequencing data. A snapshot of its novel discoveries include missing variants in SARS-CoV-2, gene regulation in diatoms epiphytic to eelgrass, an oceanic plant critical to the carbon cycle and significantly impacted by climate change, and in octopus where it identifies isoform regulation in genes missing from the reference. NOMAD is a new unifying approach to sequence analysis that enables expansive discovery.One-sentence summaryWe present a unifying, reference-free formulation of disparate genomic problems bypassing reference genomes.

Publisher

Cold Spring Harbor Laboratory

Reference100 articles.

1. DIVE: a reference-free statistical approach to diversity-generating & mobile genetic element discovery

2. A Survey of Exact Inference for Contingency Tables

3. The octopus genome and the evolution of cephalopod neural and morphological novelties

4. Baharav, T.Z. , Tse, D. and Salzman, J. (2023) “An interpretable, finite sample valid alternative to Pearson’s X2 for scientific discovery.” In preparation.

5. Detection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation, France, December 2021 - February 2022

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A reference-free algorithm discovers regulation in the plant transcriptome;2024-05-25

2. SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads;2023-03-21

3. Unsupervised reference-free inference reveals unrecognized regulated transcriptomic complexity in human single cells;2022-12-07

4. Statistical analysis supports pervasive RNA subcellular localization and alternative 3’ UTR regulation;2022-10-27