The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data-Reference-Cited by-同舟云学术

The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data

Published:2017-01-01 Issue: Volume:16 Page:117693511771053
ISSN:1176-9351
Container-title:Cancer Informatics
language:en
Short-container-title:Cancer Inform

Author:

Kim Eunji¹,Ivanov Ivan²,Hua Jianping³,Lampe Johanna W⁴,Hullar Meredith AJ⁴,Chapkin Robert S⁵,Dougherty Edward R¹³

Affiliation:

1. Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA

2. Department of Veterinary Physiology & Pharmacology, Texas A&M University, College Station, TX, USA

3. Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA

4. Public Health Sciences Division, Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

5. Program in Integrative Nutrition & Complex Diseases, Texas A&M University, College Station, TX, USA

Abstract

Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.

Publisher

SAGE Publications

Subject

Cancer Research,Oncology

Link

http://journals.sagepub.com/doi/pdf/10.1177/1176935117710530

Reference37 articles.

1. Superior feature-set ranking for small samples using bolstered error estimation

2. Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge

3. Modeling the next generation sequencing sample processing pipeline for the purposes of classification

4. Mapping and quantifying mammalian transcriptomes by RNA-Seq

5. Differential expression analysis for sequence count data

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Gut-host Crosstalk: Methodological and Computational Challenges;Digestive Diseases and Sciences;2020-02-03