The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures-Reference-Cited by-同舟云学术

The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures

Published:2004-02-12 Issue:3 Volume:14 Page:426-441
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Xing Yi,Resch Alissa,Lee Christopher

Abstract

Recent evidence of abundant transcript variation (e.g., alternative splicing, alternative initiation, alternative polyadenylation) in complex genomes indicates that cataloging the complete set of transcripts from an organism is an important project. One challenge is the fact that most high-throughput experimental methods for characterizing transcripts (such as EST sequencing) give highly detailed information about short fragments of transcripts or protein products, instead of a complete characterization of a full-length form. We analyze this “multiassembly problem”—reconstructing the most likely set of full-length isoform sequences from a mixture of EST fragment data—and present a graph-based algorithm for solving it. In a variety of tests, we demonstrate that this algorithm deals appropriately with coupling of distinct alternative splicing events, increasing fragmentation of the input data and different types of transcript variation (such as alternative splicing, initiation, polyadenylation, and intron retention). To test the method's performance on pure fragment (EST) data, we removed all mRNA sequences, and found it produced no errors in 40 cases tested. Using this algorithm, we have constructed an Alternatively Spliced Proteins database (ASP) from analysis of human expressed and genomic sequences, consisting of 13,384 protein isoforms of 4422 genes, yielding an average of 3.0 protein isoforms per gene.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference78 articles.

1. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998

2. Leptin Receptor Long-form Splice-variant Protein Expression in Neuron Cell Bodies of the Brain and Co-localization with Neuropeptide Y mRNA in the Arcuate Nucleus

3. Two distinct forms of active transcription factor CREB (cAMP response element binding protein).

4. A Family of ADP-Ribosylation Factor Effectors That Can Alter Membrane Transport through thetrans-Golgi

5. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms

Cited by 75 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data;Nature Protocols;2024-02-23

2. Accurate Flow Decomposition via Robust Integer Linear Programming;2023-03-23

3. Efficient Minimum Flow Decomposition via Integer Linear Programming;Journal of Computational Biology;2022-11-01

4. Fast, Flexible, and Exact Minimum Flow Decompositions via ILP;Lecture Notes in Computer Science;2022

5. Population-scale detection of non-reference sequence variants using colored de Bruijn graphs;Bioinformatics;2021-11-02