BACKGROUND
In order to study the feasibility of setting up a national trauma observatory in France,
OBJECTIVE
we compared the performance of several automatic language processing methods on a multi-class classification task of unstructured clinical notes.
METHODS
A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among those clinical notes 22,481 were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the TF-IDF (Term- Frequency - Inverse Document Frequency) associated with SVM (Support Vector Machine) method.
RESULTS
The transformer models consistently performed better than TF-IDF/SVM. Among the transformers, the GPTanam model pre-trained with a French corpus with an additional auto-supervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F1-score of 0.969.
CONCLUSIONS
The transformers proved efficient multi-class classification task on narrative and medical data. Further steps for improvement should focus on abbreviations expansion and multiple outputs multi-class classification.