Affiliation:
1. Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
Abstract
Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.
Publisher
Oxford University Press (OUP)
Subject
Radiology, Nuclear Medicine and imaging,General Medicine
Cited by
33 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献