Affiliation:
1. Quality and Authentication of Agricultural Products Unit, Knowledge and Valorisation of Agricultural Products Department Walloon Agricultural Research Centre (CRA‐W) Gembloux Belgium
2. Mining4Quality Murcia Spain
Abstract
AbstractThe t‐distributed stochastic neighbour embedding algorithm or t‐SNE is a non‐linear dimension reduction method used to visualise multivariate data. It enables a high‐dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two‐dimensional graph, revealing its global and local structure. t‐SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t‐SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t‐SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at‐a‐glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre‐processing, by comparing rapidly different general pre‐processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t‐SNE and PCA methods, allowing respective advantages of each to be exploited.
Reference36 articles.
1. Visualizing Data Using T‐SNE;Maaten L;J Mach Learn Res,2008
2. t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis
3. HeuerH.Text Comparison Using Word Vector Representations and Dimensionality Reduction. In8th European Conference on Python in Science (Euroscipy 2015);2016;13–16.
4. Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and ‐SNE