Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality-Reference-Cited by-同舟云学术

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

Published:2020-02 Issue:2 Volume:7 Page:190714
ISSN:2054-5703
Container-title:Royal Society Open Science
language:en
Short-container-title:R. Soc. open sci.

Author:

Shetta Omar¹^ORCID,Niranjan Mahesan¹

Affiliation:

1. Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK

Abstract

The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.

Funder

Engineering and Physical Sciences Research Council

Publisher

The Royal Society

Subject

Multidisciplinary

Link

https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.190714

Reference31 articles.

1. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

2. Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters

3. Gene expression in colorectal cancer;Birkenkamp-Demtroder K.;Cancer Res.,2002

4. Robust Detection of Outlier Samples and Genes in Expression Datasets

5. GiniClust: detecting rare cell types from single-cell gene expression data with Gini index

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Spectral type subspace clustering methods: multi-perspective analysis;Multimedia Tools and Applications;2023-10-27

2. Emerging applications of machine learning in genomic medicine and healthcare;Critical Reviews in Clinical Laboratory Sciences;2023-10-10

3. Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score;Journal of Translational Medicine;2023-02-07

4. Artificial Intelligence and Machine Learning in Clinical Research and Patient Remediation;Artificial Intelligence and Machine Learning in Healthcare;2023

5. Stability of sensorimotor network sculpts the dynamic repertoire of resting state over lifespan;Cerebral Cortex;2022-04-04