BACKGROUND
The ubiquity of clinical artificial intelligence (AI) and machine learning (ML) models necessitates measures to ensure the reliability of model output over time. Previous reviews have highlighted the lack of external validation for most clinical models, but a comprehensive review assessing the current priority given to clinical model updating is lacking.
OBJECTIVE
The objective of this study was to analyze studies of clinical AI models based on PRISMA guidelines. Additionally, a new simple checklist/score system was developed to screen the quality of published AI/ML models. The primary aim was to understand the extent to which clinical model updating is prioritized in current research.
METHODS
We conducted a systematic analysis of studies on clinical AI models, adhering to PRISMA guidelines. A new checklist/score was introduced to assess the quality of the models. Demographic composition based on ethnicity or race was also considered in the analysis. This comprehensive approach aimed to provide a thorough evaluation of the current landscape of clinical AI models.
RESULTS
The results of our analysis revealed that only 9% of the reviewed 390 AI/ML studies stated an intention or method to update their models in the future. 98% of the AI/ML models in our review were in the research phase, and only 2 % in production phase. Furthermore, a mere 12% reported following best practice standards for model development. Notably, 84% of the studies did not provide demographic composition based on ethnicity or race.
CONCLUSIONS
In conclusion, a significant portion of studies currently lack commitment to future model updates. The low adherence to best practice standards for model development also highlights areas for improvement in the field. Furthermore, the absence of demographic information in a substantial number of studies raises concerns about the generalizability and equitable application of these models. The prevalence of research phase models built on proprietary data, limiting independent verification and validation of model output is also a big concern for patient safety.
Addressing these issues is crucial for advancing the reliability and inclusivity of clinical AI and ML applications.
CLINICALTRIAL
NA
INTERNATIONAL REGISTERED REPORT
RR2-10.2196/37685