BACKGROUND
Diagnosis of venous thromboembolism (VTE) is often delayed and facilitating earlier diagnosis may improve associated morbidity and mortality. Clinical notes contain information not found elsewhere in the medical record that could facilitate timely VTE diagnosis and accurate quality measurement. However, extracting relevant information from unstructured clinical notes is complex. Today there are relatively few electronic clinical quality measures (eCQMs) in our national payment program and none that use NLP for data extraction. NLP holds great promise for making quality measurement more accurate and more efficient.
OBJECTIVE
We developed a rule-based NLP tool, VTExt, that extracts VTE symptoms from clinical note text, for use within an eCQM to quantify the rate of delayed diagnosis of VTE in primary care settings.
METHODS
We iteratively developed VTExt on an internal dataset using a rule-based approach to extract VTE symptoms from primary care clinical note text. The VTE symptoms lexicon was derived and modified with physician guidance and externally validated using data from two external healthcare organizations.
RESULTS
VTExt achieved near-perfect performance in extracting VTE symptoms from primary care notes sampled from records of patients diagnosed with and without VTE. In external validation VTExt achieved promising performance in two additional geographically distant organizations using different electronic health record systems. When compared against a deep learning-based model, VTExt exhibited similar or improved performance across all metrics.
CONCLUSIONS
This study demonstrates a data-driven NLP-based approach to clinical note information extraction that can be generalized to different electronic health record (EHR) systems across different institutions. VTExt is the first NLP application to be used in a nationally endorsed eCQM.