BACKGROUND
In the field of evidence-based medicine, randomized controlled trials (RCTs) are of critical importance for writing clinical guidelines and providing guidance to practicing physicians. Currently, RCTs rely heavily on manual extraction, but this method has data breadth limitations and is less efficient.
OBJECTIVE
To expand the breadth of data and improve the efficiency of obtaining clinical evidence, here, we introduce an automated information extraction model for traditional Chinese medicine (TCM) RCT evidence extraction.
METHODS
We adopt the Evidence-Bidirectional Encoder Representation from Transformers (Evi-BERT) for automated information extraction, which is combined with rule extraction. Eleven disease types and 48523 research articles from the CNKI database were selected as the data source for extraction. We then constructed a manually annotated dataset of TCM clinical literature to train the model, including ten evidence elements and 24244 datapoints. We chose two models, BERT-CRF and BiLSTM-CRF, as the baseline, and compared the training effects with Evi-BERT.
RESULTS
We found that Evi-BERT achieved the best F1 score (0.62) and had the best robustness. We also added a rule expression to Evi-BERT to extract information, which helped the model achieve even higher precision.
CONCLUSIONS
Our model dramatically expands the amount of data that can be searched and greatly improves efficiency without losing accuracy. This work is expected to provide an intelligent tool to extract clinical evidence for TCM RCT data collection.