Author:
Hartley John,Sanchez Pedro P.,Haider Fasih,Tsaftaris Sotirios A.
Abstract
AbstractDeep neural networks (DNNs) have achieved high accuracy in diagnosing multiple diseases/conditions at a large scale. However, a number of concerns have been raised about safeguarding data privacy and algorithmic bias of the neural network models. We demonstrate that unique features (UFs), such as names, IDs, or other patient information can be memorised (and eventually leaked) by neural networks even when it occurs on a single training data sample within the dataset. We explain this memorisation phenomenon by showing that it is more likely to occur when UFs are an instance of a rare concept. We propose methods to identify whether a given model does or does not memorise a given (known) feature. Importantly, our method does not require access to the training data and therefore can be deployed by an external entity. We conclude that memorisation does have implications on model robustness, but it can also pose a risk to the privacy of patients who consent to the use of their data for training models.
Funder
Prof. Tsaftaris acknowledges also support by a Canon Medical / Royal Academy of Engineering Research Chair
iCAIRD, Innovate UK on behalf of UK Research and Innovation
Publisher
Springer Science and Business Media LLC
Reference61 articles.
1. Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 107–115 (2021).
2. Arplt, D. et al. A closer look at memorization in deep networks. In 34th International Conference on Machine Learning, ICML 2017 1, 350–359 (2017).
3. Feldman, V. Does learning require memorization? A short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 954–959 (2020).
4. Feldman, V. & Zhang, C. What neural networks memorize and why: discovering the long tail via influence estimation. In Advances in Neural Information Processing Systems (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.), vol. 33, 2881–2891 (2020).
5. Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2, 305–311. https://doi.org/10.1038/s42256-020-0186-1 (2020).