Gene selection and classification for cancer microarray data based on machine learning and similarity measures-Reference-Cited by-同舟云学术

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Published:2011-12 Issue:S5 Volume:12 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Liu Qingzhong,Sung Andrew H,Chen Zhongxue,Liu Jianzhong,Chen Lei,Qiao Mengyu,Wang Zhaohui,Huang Xudong,Deng Youping

Abstract

Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/1471-2164-12-S5-S1.pdf

Reference38 articles.

1. Chen Z, McGee M, Liu Q, Scheuermann RH: A distribution free summarization method for Affymetrix GeneChip Arrays. Bioinformatics. 2007, 23 (3): 321-327. 10.1093/bioinformatics/btl609.

2. Quackenbush J: Computational analysis of microarray data. Nat Rev Genet. 2001, 2 (6): 418-427. 10.1038/35076576.

3. Hand DJ, Heard NA: Finding groups in gene expression data. J Biomed Biotechnol. 2005, 215-225. 2

4. Segal E, Friedman N, Kaminski N, Regev A, Koller D: From signatures to models: understanding cancer using microarrays. Nat Genet. 2005, 37 (Suppl): S38-45.

5. Torrente A, Kapushesky M, Brazma A: A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings. Bioinformatics. 2005, 21 (21): 3993-3999. 10.1093/bioinformatics/bti644.

Cited by 73 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transcriptome mining and expression analysis of ABC transporter genes in a monophagous herbivore, Leucinodes orbonalis (Crambidae: Lepidoptera);Comparative Biochemistry and Physiology Part D: Genomics and Proteomics;2024-12

2. Extramedullary hematopoiesis in cancer;Experimental & Molecular Medicine;2024-03-05

3. Genome-wide analysis of ATP-binding cassette (ABC) transporter in Penaeus vannamei and identification of two ABC genes involved in immune defense against Vibrio parahaemolyticus by affecting NF-κB signaling pathway;International Journal of Biological Macromolecules;2024-03

4. Evolutionary trajectory of organelle-derived nuclear DNAs in the Triticum/Aegilops complex species;Plant Physiology;2023-10-17

5. A novel artificial intelligence approach to detect the breast cancer using KNNet technique with EPM gene profiling;Functional & Integrative Genomics;2023-09-18