Targeted discovery of novel human exons by comparative genomics

Author:

Siepel Adam,Diekhans Mark,Brejová Broňa,Langton Laura,Stevens Michael,Comstock Charles L.G.,Davis Colleen,Ewing Brent,Oommen Shelly,Lau Christopher,Yu Hung-Chun,Li Jianfeng,Roe Bruce A.,Green Phil,Gerhard Daniela S.,Temple Gary,Haussler David,Brent Michael R.

Abstract

A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT–PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds—not thousands—of protein-coding genes are completely missing from the current gene catalogs.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference63 articles.

1. 3,400 new expressed sequence tags identify diversity of transcripts in human brain

2. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library

3. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence;Adams,;Nature,1995

4. Pairagon+N-scan EST: A model-based gene annotation pipeline;Arumugam,;Genome Biol.,2006

5. Gene Ontology: tool for the unification of biology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3