CentIER: accurate centromere identification for plant genomes with sequence specificity information
Author:
Xu Dong,Wen Huaming,Feng Wenle,Zhang Xiaohui,Hui Xingqi,Xu Yun,Chen Fei,Pan Weihua
Abstract
AbstractCentromere identification is one of the important problems in genomics, providing a foundation for the studies of centromeres in aspects of composition, functionality, evolution, inheritance, and variation. The existing wet-experiment-based method is costly and time-consuming, while the bioinformatic method can only detect tandem repeats losing non-repetitive sequence regions in the centromere. To address these shortcomings, we introduce a new pipeline, CentIER, for the automatic and accurate identification and annotation of centromere regions by taking advantage of the sequence specificity information. CentIER only requires users to input the genomic sequence, and then it can partition the centromeric region from a chromosome, identify tandem repeat monomers, annotate retrotransposons, and ultimately output visualized results. By referencing the experimentally determined centromere regions, it was discovered that the predictive accuracy of centromere recognition by CentIER exceeded 90%. Following the evaluation of CentIER’s accuracy, it was applied to investigate the sequence and distribution characteristics of centromeric retrotransposons and tandem repeat sequences of different species, providing insights into these traits in monocotyledonous and dicotyledonous plants.
Publisher
Cold Spring Harbor Laboratory
Reference23 articles.
1. Bao, Y. , Zeng, Z. , Yao, W. , Chen, X. , Jiang, M. , Sehrish, A. , Wu, B. , Powell, C.A. , Chen, B. , Xu, J. , et al. (2023). A gap-free and haplotype-resolved lemon genome provides insights into flavor synthesis and huanglongbing (HLB) tolerance. Horticulture Research 10, uhad020. 2. Mutation and selection explain why many eukaryotic centromeric DNA sequences are often A + T rich;Nucleic Acids Research,2021 3. Chen, J. , Wang, Z. , Tan, K. , Huang, W. , Shi, J. , Li, T. , Hu, J. , Wang, K. , Wang, C. , Xin, B. , et al. (2023). A complete telomere-to-telomere assembly of the maize genome. Nature Genetics. 4. A telomere-to-telomere gap-free reference genome of watermelon and its mutation library provide important resources for gene discovery and breeding;Molecular Plant,2022 5. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons
|
|