Author:
Salem Danny,Surendra Anuradha,McDowell Graeme SV,Čuperlović-Culf Miroslava
Abstract
AbstractMotivationUnsupervised data projection for the determination of trends in the data, visualization of multidimensional data in a reduced dimension space or feature space reduction through combination of data is a major step in data mining. Methods such as Principal Component Analysis or t-Distribution Stochastic Neighbor Embedding are regularly used as one of the first steps in computational biology or omics investigation. However, the significance of the separation of sample groups by these methods generally relies on visual assessment. User-friendly application for different projection methods, each focusing on distinct data properties, are needed as well as a rigorous method for statistical determination of the significance of separation of groups of interest in each dataset.ResultsWe present Projection STatistics (ProST), a user-friendly solution for data projection analysis providing three unsupervised (PCA, t-SNE and UMAP) and one supervised (LDA) approach. For each method we are including a novel statistical investigation of the significance of group separation with Mann-Whitney U-rank or t-test analysis as well as necessary preprocessing steps. ProST provides an unbiased, objective application of the determination of the significance of the separation of measurement groups through either linear or manifold projection analysis with methods ranging from a focus on the separation of points based on major variances or on point proximities based on distance.AvailabilityThe ProST software application is freely available athttps://complimet.ca/shiny/ProST/with source code provided onhttps://github.com/complimet/prost.Contactdanny.salem@nrc-cnrc.gc.caorMiroslava.cuperlovic-culf@nrc-cnrc.gc.caSupplementary informationSupplementary help pages are provided athttps://complimet.ca/shiny/ProST/.
Publisher
Cold Spring Harbor Laboratory