Hiding under the ROC Curve: Detection of Malicious Cyberattacks in EHR Data from Model Prediction Variance (Preprint)-Reference-Cited by-同舟云学术

Hiding under the ROC Curve: Detection of Malicious Cyberattacks in EHR Data from Model Prediction Variance (Preprint)

Published:2022-01-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bouayad Lina^ORCID,Padmanabhan Balaji^ORCID,Schultz Susan

Abstract

BACKGROUND

Machine learning models are increasingly being used in healthcare settings. If integrity of the data used to build these models is impacted by cybersecurity attacks, the results of these predictive models become questionable.

OBJECTIVE

To assess the risks associated with false data injections in provider progress notes, and to evaluate the potential of exploiting variance in predictions of text mining methods at detecting such data integrity issues.

METHODS

A simulation of false data injection scenarios was conducted on a set of provider notes. Common statistical text mining (STM) methods were used to assess the mental health severity of patients described in the falsified notes. The simulation experiment focused on 1) assessing the overall classification stability across the different types of false data injections, 2) identifying the classification algorithms that are robust against these attacks, and 3) evaluating the potential of STM methods at signaling data integrity issues.

RESULTS

A simulation experiment using a training dataset of 96 severe psychiatric provider notes, 337 non-severe notes revealed that the performance of classification models drops with false data injection attacks. The accuracy of single models such as support vector machines and decision trees significantly dropped (average drop of 16.41%) by injection of a sole screening template. Ensemble models such as bagging and boosting were robust against sole screening template injections with an average drop of the accuracy level by about 0.513%. The performance of all models dropped significantly when false data exceeded 50% of the size of the note.

CONCLUSIONS

While STM methods can be useful at assessing the severity of the mental health conditions expressed in provider notes, the performance of such models can drop significantly with false data injections. Traditionally, such non-robust behavior of STM models is undesirable. Counter-intuitively, we show here that such a lack of robustness can be leveraged to generate signals of malicious false data injections into EHR systems. Hence prediction variance of these models can potentially be used to signal data integrity issues.

Publisher

JMIR Publications Inc.

Reference24 articles.

1. Health IT, hacking, and cybersecurity: national trends in data breaches of protected health information

2. False Data Injection Attacks in Healthcare

3. Region-based tampering detection and recovery using homogeneity analysis in quality-sensitive imaging

4. Robust perceptual image hashing via matrix invariants

5. Blockchain-Based Personal Health Records Sharing Scheme With Data Integrity Verifiable