Novel functional sequences uncovered through a bovine multiassembly graph-Reference-Cited by-同舟云学术

Novel functional sequences uncovered through a bovine multiassembly graph

Published:2021-05-10 Issue:20 Volume:118 Page:e2101056118
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Crysnanto Danang^ORCID,Leonard Alexander S.^ORCID,Fang Zih-Hua^ORCID,Pausch Hubert^ORCID

Abstract

Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis–infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.

Funder

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference74 articles.

1. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution

2. De novo assembly of the cattle reference genome with single-molecule sequencing;Rosen;Gigascience,2020

3. De novo assembly of haplotype-resolved genomes with trio binning;Koren;Nat. Biotechnol.,2018

4. Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle;Low;Nat. Commun.,2020

5. Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle;Rice;Gigascience,2020

Cited by 50 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Constructing a Draft Indian Cattle Pangenome Using Short-Read Sequencing;2024-09-03

2. An overview of recent technological developments in bovine genomics;Veterinary and Animal Science;2024-09

3. RNA-DNA differences in variant calls from cattle tissues result in erroneous eQTLs;BMC Genomics;2024-08-01

4. Advancing the Indian Cattle Pangenome: Characterizing Non-Reference Sequences inBos indicus;2024-07-22

5. A Pilot Detection and Associate Study of Gene Presence-Absence Variation in Holstein Cattle;Animals;2024-06-28