Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R-Reference-Cited by-同舟云学术

Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R

Published:2023 Issue: Volume: Page:310-332
ISSN:1680-743X
Container-title:Journal of Data Science
language:en
Short-container-title:

Author:

Wu Zhaoxing,Zhang Chunming^ORCID

Abstract

Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\mathrm{PDA}$ index with the $\mathrm{LDA}$ index and $\mathrm{SVM}$, demonstrating that the $\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.

Publisher

School of Statistics, Renmin University of China

Subject

Industrial and Manufacturing Engineering

Reference17 articles.

1. Molecular classification of Crohn’s disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells;The Journal of Molecular Diagnostics,2006

2. Support-vector networks;Machine Learning,1995

3. A projection pursuit algorithm for exploratory data analysis;IEEE Transactions on Computers,1974

4. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring;Science,1999

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Editorial: Symposium Data Science and Statistics 2022;Journal of Data Science;2023