BACKGROUND
Motivational Interviewing (MI) is a therapeutic technique that has been successful in helping smokers reduce smoking, but has limited accessibility due to the high cost and low availability of clinicians. To address this, the MIBot project has sought to develop a chatbot that emulates an MI session with a client with the specific goal of moving an ambivalent smoker towards the direction of quitting. One key element of an MI conversation is reflective listening, where a therapist expresses their understanding of what the client has said by uttering a reflection that encourages the client to continue their thought process. Complex reflections link the client’s responses to relevant ideas and facts to enhance this contemplation. Backward-looking complex reflections (BLCRs) link the client’s most recent response to a relevant selection of the client’s previous statements. Our current chatbot can generate complex reflections - but not BLCRs - using large language models (LLMs) such as GPT-2, which allows the generation of unique, human-like messages customized to client responses. Recent advances in these models, such as the introduction of GPT-4, provide a novel way to generate complex text by feeding the models instructions and conversational history directly, making this a promising approach to generate BLCRs.
OBJECTIVE
To develop a method to generate BLCRs for an MI-based smoking cessation chatbot, and to measure the method's effectiveness.
METHODS
Large Language Models such as GPT-4 can be stimulated to produce specific types of responses to their inputs by “asking” them with an English-based description of the desired output. These descriptions are called prompts, and the challenge of writing a description that allows LLMs to generate the optimal output is termed prompt engineering. We evolved an instruction to prompt GPT-4 to generate a BLCR given the prior transcript of the conversation up to the point where the reflection was needed. The approach was tested on 50 previously collected MIBot transcripts of conversations with smokers, and was used to generate a total of 150 reflections. The quality of the reflections was rated on a 4-point scale by three independent raters to determine if they met specific criteria for acceptability.
RESULTS
Of the 150 generated reflections, 132 (88%) of the reflections met the level of acceptability. The remaining 18 (12%) had one or more flaws that made them inappropriate BLCRs. The three raters had pairwise agreement on 80% to 88% of these scores.
CONCLUSIONS
The method presented to generate BLCRs is good enough to be used as one source of reflections in an MI-style conversation, but would need an automatic checker to eliminate the unacceptable ones. This work illustrates the power of the new LLMs to generate therapeutic client-specific responses under the command of a language-based specification.