Hiding under the ROC Curve: Detection of Malicious Cyberattacks in EHR Data from Model Prediction Variance (Preprint)

Author:

Bouayad LinaORCID,Padmanabhan BalajiORCID,Schultz Susan

Abstract

BACKGROUND

Machine learning models are increasingly being used in healthcare settings. If integrity of the data used to build these models is impacted by cybersecurity attacks, the results of these predictive models become questionable.

OBJECTIVE

To assess the risks associated with false data injections in provider progress notes, and to evaluate the potential of exploiting variance in predictions of text mining methods at detecting such data integrity issues.

METHODS

A simulation of false data injection scenarios was conducted on a set of provider notes. Common statistical text mining (STM) methods were used to assess the mental health severity of patients described in the falsified notes. The simulation experiment focused on 1) assessing the overall classification stability across the different types of false data injections, 2) identifying the classification algorithms that are robust against these attacks, and 3) evaluating the potential of STM methods at signaling data integrity issues.

RESULTS

A simulation experiment using a training dataset of 96 severe psychiatric provider notes, 337 non-severe notes revealed that the performance of classification models drops with false data injection attacks. The accuracy of single models such as support vector machines and decision trees significantly dropped (average drop of 16.41%) by injection of a sole screening template. Ensemble models such as bagging and boosting were robust against sole screening template injections with an average drop of the accuracy level by about 0.513%. The performance of all models dropped significantly when false data exceeded 50% of the size of the note.

CONCLUSIONS

While STM methods can be useful at assessing the severity of the mental health conditions expressed in provider notes, the performance of such models can drop significantly with false data injections. Traditionally, such non-robust behavior of STM models is undesirable. Counter-intuitively, we show here that such a lack of robustness can be leveraged to generate signals of malicious false data injections into EHR systems. Hence prediction variance of these models can potentially be used to signal data integrity issues.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3