BACKGROUND
For critically ill children who cannot communicate and express themselves sufficiently, facial expressions are important indicators of their pain levels. Dataset training and testing quality is a crucial factor affecting the performance of facial expression analysis algorithms. Establishing a high-quality standardized dataset requires in-depth research.
OBJECTIVE
This study aims to propose a standard for constructing a facial expression-based pain assessment dataset for critically ill children by establishing a large-scale, high-quality sample dataset and validating the dataset using deep learning models.
METHODS
Based on the principles of standardization, diversity, and authority of high-quality datasets, we establish standards for constructing a facial expression-based pain assessment dataset for critically ill children. The children's facial expression data were collected in two typical scenarios, the Pediatric Intensive Care Unit (PICU) and the Cardiac Intensive Care Unit (CICU) at Children's Hospital of Fudan University. Then, each sample was annotated by three clinical experts to classify their facial expressions into five pain levels. Finally, deep learning algorithms were used to verify the feasibility and applicability of the dataset.
RESULTS
The Pain Expression of Critically Ill Children (PECIC) dataset established in this study is the most extensive facial expression-based pain assessment dataset for critically ill children to date, including 53 children, 119 pain expression videos, and 6,951 pain expression images collected from the Pediatric Intensive Care Unit (PICU) and Cardiac Intensive Care Unit (CICU) at Children's Hospital of Fudan University from December 2022 to January 2023. Data collection was balanced for age, weight, sex, and mechanical ventilation status of the children. Each image was annotated by three clinical experts. The Swin Transformer model was trained and tested using the established PECIC dataset, achieving an accuracy of 88.3%, precision of 88.3%, recall rate of 88.7%, F1-Score of 88.5%, and false-positive rate of 3.0%. Prediction errors were evenly distributed among adjacent pain levels. The comparison results with the Classification of Pain Expressions (COPE) dataset demonstrated the usefulness, accuracy, validity, and comprehensiveness of the PECIC dataset.
CONCLUSIONS
Compared to the COPE dataset, the PECIC dataset in this study leads to higher accuracy with the trained model, demonstrating better usability and comprehensiveness in training algorithm models. Therefore, using the PECIC dataset for deep learning-based analysis and evaluating pain expressions in critically ill children is more feasible and applicable.