Author:
Beaton Derek,Sunderland Kelly M.,Levine Brian,Mandzia Jennifer,Masellis Mario,Swartz Richard H.,Troyer Angela K.,Binns Malcolm A.,Abdi Hervé,Strother Stephen C., ,
Abstract
AbstractThe minimum covariance determinant (MCD) algorithm is one of the most common techniques to detect anomalous or outlying observations. The MCD algorithm depends on two features of multivariate data: the determinant of a matrix (i.e., geometric mean of the eigenvalues) and Mahalanobis distances (MD). While the MCD algorithm is commonly used, and has many extensions, the MCD is limited to analyses of quantitative data and more specifically data assumed to be continuous. One reason why the MCD does not extend to other data types such as categorical or ordinal data is because there is not a well-defined MD for data types other than continuous data. To address the lack of MCD-like techniques for categorical or mixed data we present a generalization of the MCD. To do so, we rely on a multivariate technique called correspondence analysis (CA). Through CA we can define MD via singular vectors and also compute the determinant from CA’s eigenvalues. Here we define and illustrate a generalized MCD on categorical data and then show how our generalized MCD extends beyond categorical data to accommodate mixed data types (e.g., categorical, ordinal, and continuous). We illustrate this generalized MCD on data from two large scale projects: the Ontario Neurodegenerative Disease Research Initiative (ONDRI) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI), with genetics (categorical), clinical instruments and surveys (categorical or ordinal), and neuroimaging (continuous) data. We also make R code and toy data available in order to illustrate our generalized MCD.
Publisher
Cold Spring Harbor Laboratory
Reference55 articles.
1. Abdi, H. , & Valentin, D. (2007). Multiple correspondence analysis. Encyclopedia of Measurement and Statistics, 651–657.
2. Multiple factor analysis: principal component analysis for multitable and multiblock data sets
3. Aust, F. , & Barth, M. (2018). papaja: Create APA manuscripts with R Markdown. Retrieved from https://github.com/crsh/papaja
4. Generalization of the Mahalanobis Distance in the Mixed Case
5. Singular vectors and estimates of the analysis-error covariance metric;Quarterly Journal of the Royal Meteorological Society,1998
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献