CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure-Reference-Cited by-同舟云学术

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure

Published:2022-12-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Varabyou Ales^ORCID,Sommer Markus J.,Erdogdu Beril^ORCID,Shinder Ida^ORCID,Minkin Ilia^ORCID,Chao Kuan-Hao,Park Sukhwan,Heinz Jakob,Pockrandt Christopher,Shumate Alaina,Rincon Natalia,Puiu Daniela,Steinegger Martin,Salzberg Steven L.,Pertea Mihaela^ORCID

Abstract

AbstractThe original CHESS database of human genes was assembled from nearly 10,000 RNA sequencing experiments in 53 human body sites produced by the Genotype-Tissue Expression (GTEx) project, and then augmented with genes from other databases to yield a comprehensive collection of protein-coding and noncoding transcripts. The construction of the new CHESS 3 database employed improved transcript assembly algorithms, a new machine learning classifier, and protein structure predictions to identify genes and transcripts likely to be functional and to eliminate those that appeared more likely to represent noise. The new catalog contains 41,356 genes on the GRCh38 reference human genome, of which 19,839 are protein-coding, and a total of 158,377 transcripts. These include 14,863 novel protein-coding transcripts. The total number of transcripts is substantially smaller than earlier versions due to improved transcriptome assembly methods and to a stricter protocol for filtering out noisy transcripts. Notably, CHESS 3 contains all of the transcripts in the MANE database, and at least one transcript corresponding to the vast majority of protein-coding genes in the RefSeq and GENCODE databases. CHESS 3 has also been mapped onto the complete CHM13 human genome, which gives a more-complete gene count of 43,773 genes and 19,968 protein-coding genes. The CHESS database is available athttp://ccb.jhu.edu/chess.

Publisher

Cold Spring Harbor Laboratory

Reference36 articles.

1. The complete sequence of a human genome

2. Frankish, A. , S. Carbonell-Sala , M. Diekhans , et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res, 2022.

3. Liftoff: accurate mapping of gene annotations;Bioinformatics,2020

4. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

5. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity;2024-03-18

2. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure;Genome Biology;2023-10-30

3. Investigating open reading frames in known and novel transcripts using ORFanage;Nature Computational Science;2023-07-31

4. Splam: a deep-learning-based splice site predictor that improves spliced alignments;2023-07-29

5. Detecting differential transcript usage in complex diseases with SPIT;2023-07-10