Affiliation:
1. Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University , Heidelberglaan 100, Utrecht 3584 CX , The Netherlands
2. Dutch Heart Foundation , The Hague , The Netherlands
3. Central Diagnostic Laboratory, University Medical Center Utrecht , Utrecht , The Netherlands
Abstract
Abstract
Aims
Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs.
Methods and results
We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs.
Conclusion
Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting.
Publisher
Oxford University Press (OUP)
Subject
Energy Engineering and Power Technology,Fuel Technology
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献