Abstract
Background
The study of doctor-patient-computer interactions is a key research area for examining doctor-patient relationships; however, studying these interactions is costly and obtrusive as researchers usually set up complex mechanisms or intrude on consultations to collect, then manually analyze the data.
Objective
We aimed to facilitate human-computer and human-human interaction research in clinics by providing a computational ethnography tool: an unobtrusive automatic classifier of screen gaze and dialogue combinations in doctor-patient-computer interactions.
Methods
The classifier’s input is video taken by doctors using their computers' internal camera and microphone. By estimating the key points of the doctor's face and the presence of voice activity, we estimate the type of interaction that is taking place. The classification output of each video segment is 1 of 4 interaction classes: (1) screen gaze and dialogue, wherein the doctor is gazing at the computer screen while conversing with the patient; (2) dialogue, wherein the doctor is gazing away from the computer screen while conversing with the patient; (3) screen gaze, wherein the doctor is gazing at the computer screen without conversing with the patient; and (4) other, wherein no screen gaze or dialogue are detected. We evaluated the classifier using 30 minutes of video provided by 5 doctors simulating consultations in their clinics both in semi- and fully inclusive layouts.
Results
The classifier achieved an overall accuracy of 0.83, a performance similar to that of a human coder. Similar to the human coder, the classifier was more accurate in fully inclusive layouts than in semi-inclusive layouts.
Conclusions
The proposed classifier can be used by researchers, care providers, designers, medical educators, and others who are interested in exploring and answering questions related to screen gaze and dialogue in doctor-patient-computer interactions.