Targeted discovery of novel human exons by comparative genomics-Reference-Cited by-同舟云学术

Targeted discovery of novel human exons by comparative genomics

Published:2007-11-07 Issue:12 Volume:17 Page:1763-1773
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Siepel Adam,Diekhans Mark,Brejová Broňa,Langton Laura,Stevens Michael,Comstock Charles L.G.,Davis Colleen,Ewing Brent,Oommen Shelly,Lau Christopher,Yu Hung-Chun,Li Jianfeng,Roe Bruce A.,Green Phil,Gerhard Daniela S.,Temple Gary,Haussler David,Brent Michael R.

Abstract

A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT–PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds—not thousands—of protein-coding genes are completely missing from the current gene catalogs.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference63 articles.

1. 3,400 new expressed sequence tags identify diversity of transcripts in human brain

2. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library

3. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence;Adams,;Nature,1995

4. Pairagon+N-scan EST: A model-based gene annotation pipeline;Arumugam,;Genome Biol.,2006

5. Gene Ontology: tool for the unification of biology

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers;GigaScience;2020-01-01

2. Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants in SCN1A;npj Genomic Medicine;2019-12

3. Confirmation of Transcriptional Read-Through Events by RT-PCR;Methods in Molecular Biology;2019-11-15

4. Systematic re-annotation of 191 genes associated with early-onset epilepsy unmasks de novo variants linked to Dravet syndrome in novel SCN1A exons;2019-05-30

5. Intergenically Spliced Chimeric RNAs in Cancer;Trends in Cancer;2016-09