BACKGROUND
Autism Spectrum Disorder (ASD) is a prevalent neurodevelopmental disorder encountered by 1 in 44 children in the United States of America. Autism patients face difficulty effectively communicating with peers, articulating feelings and emotions, and controlling behaviors. Rapid and early diagnosis leads to improved treatment outcomes, however current medical techniques often take years. Difficult, time-consuming diagnoses accompanied by the recent advances in computer vision have led to a surge of interest from researchers in developing models to streamline the autism diagnosis process.
OBJECTIVE
We aim to assess the viability of at-home autism diagnosis by leveraging video data collected from a mobile game app to train a computer vision model utilizing insights extracted from the change in emotion expression features over time.
METHODS
With our GuessWhat game-based mobile app, we collect a video dataset of 74 ASD and NT children actively playing in a natural home environment. To investigate and quantify the significance of facial emotion features for ASD detection, we develop a deep learning-based autism classifier with two components: a Convolutional Neural Network (CNN) attached to a Long Short Term Memory (LSTM). We pre-train our CNN backbone in two fashions: (1) on ImageNet or (2) on a compilation of Facial Expression Recognition (FER) datasets to analyze autism detection performance improvements utilizing emotion features. The output of each CNN is fed into an LSTM model to diagnose ASD from video data.
RESULTS
Our top-performing architecture utilizing a CNN backbone pre-trained on ImageNet obtained an accuracy of 45.8% and an F1 score of 62.8% while our corresponding top architecture employing a CNN backbone trained on the FER datasets achieved top accuracy of 91.2% and an F1 score of 90.6%.
CONCLUSIONS
We discovered the change in an individual's facial expression features over time as a relevant marker for autism detection. Extracting emotional features from each frame resulted in a 27.8% F1 score improvement when compared to non-emotion weights. Our study demonstrates the capability of mobile applications to collect a natural, diverse dataset for an improved autism diagnosis. These results demonstrate Deep Learning and Computer Vision based methods are instrumental for automated autism diagnosis from at-home recorded videos using unspecialized equipment.