Identifying marker genes in transcription profiling data using a mixture of feature relevance experts

Author:

Chow M. L.12,Moler E. J.13,Mian I. S.1

Affiliation:

1. Radiation Biology and Environmental Toxicology Group, Department of Cell and Molecular Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720

2. Gene Logic Incorporated, Berkeley, California 94704

3. Chiron Corporation, Emeryville, California 94608

Abstract

Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.

Publisher

American Physiological Society

Subject

Genetics,Physiology

Reference24 articles.

1. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

2. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays

3. Knowledge-based analysis of microarray gene expression data by using support vector machines

4. Orphan selenoproteins

5. Cheeseman P and Stutz J.Bayesian classification (AutoClass): theory and results. In:Advances in Knowledge Discovery and Data Mining,edited by Fayyad UM, Piatetsky-Shapiro, G, Smyth P, and Uthurusamy R. AAAI Press/MIT Press, 1996. [The software is available at http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/index.html].

Cited by 77 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Biconvex Clustering;Journal of Computational and Graphical Statistics;2023-05-17

2. Single-cell transcriptome study in forensic medicine: prospective applications;International Journal of Legal Medicine;2022-09-09

3. Microarray cancer feature selection: Review, challenges and research directions;International Journal of Cognitive Computing in Engineering;2020-06

4. Prognostic significance of serum progranulin level in de novo adult acute lymphoblastic leukemia patients;Clinical and Experimental Medicine;2020-01-31

5. Feature selection with multi-view data: A survey;Information Fusion;2019-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3