A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification-Reference-Cited by-同舟云学术

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Published:2008-07-22 Issue:1 Volume:9 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Statnikov Alexander,Wang Lily,Aliferis Constantin F

Abstract

Abstract Background Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. Results In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. Conclusion We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-9-319.pdf

Reference30 articles.

1. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21: 631–643.

2. Breiman L: Random forests. Machine Learning 2001, 45: 5–32.

3. Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 2003, 19: 1636–1643.

4. Lee JW, Lee JB, Park M, Song SH: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 2005, 48: 869–885.

5. Diaz-Uriarte R, Alvarez de Andres S: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006, 7: 3.

Cited by 509 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Wavelet feature extraction and bio-inspired feature selection for the prognosis of lung cancer − A statistical framework analysis;Measurement;2024-10

2. Developing a disaster risk index for coastal communities in southwest Bangladesh: Shifting from data-driven models to holistic approaches;Ecological Indicators;2024-09

3. Amount, distribution and controls of the soil organic carbon storage loss in the degraded China's grasslands;Science of The Total Environment;2024-09

4. Mental issues, internet addiction and quality of life predict burnout among Hungarian teachers: a machine learning analysis;BMC Public Health;2024-08-27

5. Exosome- Machine Learning Integration in Biomedicine: Advancing Diagnosis and Biomarker Discovery;Current Medicinal Chemistry;2024-08-20