So you think you can PLS-DA?-Reference-Cited by-同舟云学术

So you think you can PLS-DA?

Published:2020-12 Issue:S1 Volume:21 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Ruiz-Perez Daniel,Guan Haibin,Madhivanan Purnima,Mathee Kalai,Narasimhan Giri

Abstract

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

http://link.springer.com/content/pdf/10.1186/s12859-019-3310-7.pdf

Reference30 articles.

1. Ståhle L, Wold S. Partial least squares analysis with cross-validation for the two-class problem: A monte carlo study. J Chemometrics. 1987; 1(3):185–96.

2. Barker M, Rayens W. Partial least squares for discrimination. J Chemometrics. 2003; 17(3):166–73.

3. Gottfries J, Blennow K, Wallin A, Gottfries C. Diagnosis of dementias using partial least squares discriminant analysis. Dementia Geriatric Cognit Disorders. 1995; 6(2):83–8.

4. Worley B, Powers R. Multivariate analysis in metabolomics. Curr Metabol. 2013; 1(1):92–107.

5. Worley B, Halouska S, Powers R. Utilities for quantifying separation in PCA/PLS-DA scores plots. Anal Biochem. 2013; 433(2):102–4.

Cited by 210 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning supported single-stranded DNA sensor array for multiple foodborne pathogenic and spoilage bacteria identification in milk;Food Chemistry;2025-01

2. The development of machine learning approaches in two-dimensional NMR data interpretation for metabolomics applications;Analytical Biochemistry;2024-12

3. Methods in DNA methylation array dataset analysis: A review;Computational and Structural Biotechnology Journal;2024-12

4. Chemical profiling of paper recycling grades using GC-MS and LC-MS: An exploration of contaminants and their possible sources;Waste Management;2024-12

5. Discovery of plasma biomarkers for Parkinson's disease diagnoses based on metabolomics and lipidomics;Chinese Chemical Letters;2024-11