BACKGROUND
Retinitis Pigmentosa (RP) is a genetic disease that causes progression vision loss, which almost invariably leads to blindness. Being a rare disease, RP is costly to diagnose and takes a long time to diagnose. Patients may inevitably turn to ChatGPT or similar large language AI models (LLMs) to answer some of their questions, however, given the amount of incorrect information on the internet combined with small training data available for RP, the answers may not be correct.
OBJECTIVE
We sought to investigate how correct 4 LLMs’ (ChatGPT 3.5 and 4.0, Claude, and Copilot) answers were for RP by comparing them semantically via cosine score to the American Academy of Opthalmology’s (AAO) webpage on RP.
METHODS
Embeddings for the LLM's outputs were computed using the MiniLM-v2 model from HuggingFace and the cosine between the AAO's sentence and the LLM's sentence were taken. To summarize the data, New Dale-Chall readability scores were calculated.
RESULTS
We find that the LLMs answer the questions reasonably similarly to the AAO, and there was no significant difference between ChatGPT 3.5, 4.0, and Claude; Copilot had lower cosine scores. However, every LLM had a significantly harder readability level, indicating that LLM outputs may be difficult for the lay public to comprehend.
CONCLUSIONS
This study demonstrates that AI models can produce conversational text that is highly accurate. Their high semantic similarity to the AAO's official website indicates that they provide good background information to curious patients. LLM's future use in patient education cannot be ignored, as AI technology is further adopted.