Author:
Jonnagaddala Jitendra,Chen Aipeng,Batongbacal Sean,Nekkantti Chandini
Abstract
AbstractFor research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.
Publisher
Springer Science and Business Media LLC
Reference46 articles.
1. Ahmed, T., Aziz, M. M. A. & Mohammed, N. De-identification of electronic health record using neural network. Sci. Rep. 10, 18600 (2020).
2. Bertagnolli, M. M. et al. Status Update on Data Required to Build a Learning Health System. J. Clin. Oncol. 38, 1602–1607 (2020).
3. Li, R., Chen, Y., Ritchie, M. D. & Moore, J. H. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21, 493–502 (2020).
4. Dorr, D. A., Phillips, W. F., Phansalkar, S., Sims, S. A. & Hurdle, J. F. Assessing the difficulty and time cost of de-identification in clinical narratives. Methods Arch. 45, 246–252 (2006).
5. Guo, G. N. et al. Comparison of the cohort selection performance of Australian Medicines Terminology to Anatomical Therapeutic Chemical mappings. J. Am. Med. Inform. Assoc. 26, 1237–1246 (2019).
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献