Abstract
AbstractHigh dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further exacerbated by noise that needs adding to each dimension to achieve the required privacy. In this paper, we present a surprisingly simple approach to address all of these concerns at once, based on histograms constructed on a low-dimensional random projection (RP) of the data. Our approach exploits RP to take advantage of hidden low-dimensional structures in the data, yielding both computational efficiency, and improved error convergence with respect to the sample size—whereby less training data suffice for learning. We also propose a variant for efficient differentially private (DP) classification that further exploits the data-oblivious nature of both the histogram construction and the RP based dimensionality reduction, resulting in an efficient management of the privacy budget. We present a detailed and rigorous theoretical analysis of generalisation of our algorithms in several settings, showing that our approach is able to exploit low-dimensional structure of the data, ameliorates the ill-effects of noise required for privacy, and has good generalisation under minimal conditions. We also corroborate our findings experimentally, and demonstrate that our algorithms achieve competitive classification accuracy in both non-private and private settings.
Funder
Engineering and Physical Sciences Research Council
Publisher
Springer Science and Business Media LLC
Reference49 articles.
1. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, Cambridge
2. Mahoney M (2009) The Johnson-Lindenstrauss lemma. Lecture notes on Alg orithms for Modern Massive Data Set Analysis
3. Blocki J, Blum A, Datta A, Sheffet O (2012) The johnson-lindenstrauss transform itself preserves differential privacy. In: 2012 IEEE 53rd annual symposium on foundations of computer science, 410–419. IEEE
4. Liaw C, Mehrabian A, Plan Y, Vershynin R (2017) A simple tool for bounding the deviation of random matrices on geometric sets, 277–299
5. Papadimitriou CH, Vempala SS (2019) Random projection in the brain and computation with assemblies of neurons. In: information technology convergence and services