METRICS: Establishing a Preliminary Checklist to Standardize Design and Reporting of Artificial Intelligence-Based Studies in Healthcare (Preprint)-Reference-Cited by-同舟云学术

METRICS: Establishing a Preliminary Checklist to Standardize Design and Reporting of Artificial Intelligence-Based Studies in Healthcare (Preprint)

Published:2023-11-19 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Sallam Malik^ORCID,Barakat Muna^ORCID,Sallam Mohammed^ORCID

Abstract

BACKGROUND

Adherence to evidence-based practice is indispensable in healthcare. Recently, the utility of artificial intelligence (AI)-based models in healthcare has been evaluated extensively. However, the lack of consensus guidelines for design and reporting of findings in these studies pose challenges to interpretation and synthesis of evidence.

OBJECTIVE

To propose a preliminary framework forming the basis of comprehensive guidelines to standardize reporting of AI-based studies in healthcare education and practice.

METHODS

A systematic literature review was conducted on Scopus, PubMed, and Google Scholar. The published records with “ChatGPT”, “Bing”, or “Bard” in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and gaps in reporting. Panel discussion followed to establish a unified and thorough checklist for reporting. Testing of the finalized checklist on the included records was done by two independent raters with Cohen’s κ as the method to evaluate the inter-rater reliability.

RESULTS

The final dataset that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included nine pertinent themes collectively referred to as “METRICS”: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and inter-rater reliability; (8) Count of queries executed to test the model; (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0±0.58. The tested METRICS score was acceptable by the range of Cohen’s κ of 0.558–0.962 (P<.001 for the nine tested items). Classified per item, the highest average METRICS score was recorded for the “Model” item, followed by “Specificity of the prompts and language used” item, while the lowest scores were recorded for the “Randomization of selecting the queries” item classified as sub-optimal and “Individual factors in selecting the queries and inter-rater reliability” item classified as satisfactory.

CONCLUSIONS

The findings highlighted the need for standardized reporting algorithms for AI-based studies in healthcare based on variability observed in methodologies and reporting. The proposed METRICS checklist could be the preliminary helpful step to establish a universally accepted approach to standardize reporting in AI-based studies in healthcare, a swiftly evolving research topic.

Publisher

JMIR Publications Inc.

Reference92 articles.

1. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns

2. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review

3. Integrating AI in medical education: embracing ethical usage and critical understanding

4. ChatGPT in Healthcare: A Taxonomy and Systematic Review

5. The ChatGPT (Generative Artificial Intelligence) Revolution Has Made Artificial Intelligence Approachable for Medical Professionals

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Superior Performance of Artificial Intelligence Models in English Compared to Arabic in Infectious Disease Queries;2024-01-11

2. Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions;2024-01-09

3. Below average ChatGPT performance in medical microbiology exam compared to university students;Frontiers in Education;2023-12-21

4. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios;Cureus;2023-12-16

5. Bibliometric Top Ten Healthcare Related ChatGPT Publications in Scopus, Web of Science, and Google Scholar in the First ChatGPT Anniversary (Preprint);2023-12-01