Nested and Repeated Cross Validation for Classification Model With High-Dimensional Data-Reference-Cited by-同舟云学术

Nested and Repeated Cross Validation for Classification Model With High-Dimensional Data

Published:2020-01-01 Issue:1 Volume:43 Page:103-125
ISSN:2389-8976
Container-title:Revista Colombiana de Estadística
language:
Short-container-title:Rev. colomb. estad.

Author:

Zhong Yi,He Jianghua,Chalise Prabhakar

Abstract

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.

Publisher

Universidad Nacional de Colombia

Subject

Statistics and Probability

Reference27 articles.

1. Braga-Neto, U. M. & Dougherty, E. R. (2004), ‘Is cross-validation valid for small sample microarray classification?’, Bioinformatics 20(3), 374–380.

2. Breiman, L. (2001), ‘Random Forest’, Machine Learning 5(32).

3. Cortes, C. & Vapnik, V. (1995), ‘Support-Vector Networks’, Machine Learning 45(1), 5–32.

4. Dash, M. & Liu, H. (1997), ‘Feature Selection for Classification’, Intell. Data Anal 1(3), 131–156.

5. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loa, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. & Lander, E. S. (1999), ‘Molecular classification of cancer: class discovery and class prediction by gene expression monitoring’, Science 286(5439), 531–537.

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Alfalfa Biomass Prediction: An Innovative Framework Using Remote Sensing Data;Remote Sensing;2024-09-11

2. Virtual Metrology of Critical Dimensions in Plasma Etch Processes Using Entire Optical Emission Spectrum;IEEE Transactions on Semiconductor Manufacturing;2024-08

3. Utilizing a Pathomics Biomarker to Predict the Effectiveness of Bevacizumab in Ovarian Cancer Treatment;Bioengineering;2024-07-03

4. Machine Learning Prediction of Treatment Response to Biological Disease-Modifying Antirheumatic Drugs in Rheumatoid Arthritis;Journal of Clinical Medicine;2024-07-02

5. Appliance Ownership Prediction With Smart Meter Data;The 15th ACM International Conference on Future and Sustainable Energy Systems;2024-05-31