Automated annotation of human centromeres with HORmon-Reference-Cited by-同舟云学术

Automated annotation of human centromeres with HORmon

Published:2022-05-11 Issue:6 Volume:32 Page:1137-1151
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Kunyavskaya Olga,Dvorkina Tatiana^ORCID,Bzikadze Andrey V.,Alexandrov Ivan A.,Pevzner Pavel A.

Abstract

Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats [HORs]). Although there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we show that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.

Funder

National Science Foundation EAGER

Saint Petersburg State University, Russia

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference33 articles.

1. Ahuja RK , Magnati TL , Orlin JB . 1993. Network flows: theory, algorithms, and applications. Prentice-Hall, Upper Saddle River, NJ.

2. Alpha-satellite DNA of primates: old and new families

3. Organization and Evolution of Primate Centromeric DNA from Whole-Genome Shotgun Sequence Data

4. Complete genomic and epigenetic maps of human centromeres

5. Repetitive Fragile Sites: Centromere Satellite DNA As a Source of Genome Instability in Human Diseases

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes;The American Journal of Human Genetics;2023-11

2. De novo reconstruction of satellite repeat units from sequence data;Genome Research;2023-11

3. UniAligner: a parameter-free framework for fast sequence alignment;Nature Methods;2023-08-14

4. Characterization of large-scale genomic differences in the first complete human genome;Genome Biology;2023-07-04

5. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv;Nucleic Acids Research;2023-06-20