omplete Extraction of Pregnancy and Gestation Information from Electronic Medical Records and Effective Privacy Protection Strategies: Experience from a National Healthcare Data Network in China (Preprint)

Author:

Jiao YuanshiORCID,Su Licong,Liu Wenna,Nie Sheng,Gong Mengchun

Abstract

BACKGROUND

Pregnancy and gestation information is routinely recorded in the electronic medical records (EMR) systems in China in various datasets. The combination of the two data, i.e. times of pregnancy and times of gestation, implies the incident of abortion and other pregnancy-related issues, which is important for clinical decisions making and personal privacy protection. The distribution of this information inside EMR is variable, due to the inconsistent IT structures of EMR systems, and the quantitative evaluation of the potential exposure of this sensitive information has never been performed at a large scale.

OBJECTIVE

We aim to perform the first nationwide quantitative analysis on the identification sites and exposure frequency of sensitive pregnancy and gestation information to propose strategies for effective information extraction and privacy protection related to women’s health.

METHODS

The data extraction study was performed in a national healthcare data network. Rule-based protocols for pregnancy and gestation information extraction were developed by a committee of experts. Six different sub-datasets of EMRs are used as a schema for data analysis and strategy proposal. The identification sites and the frequency of identification in different sub-datasets were calculated. The manual quality inspection of extraction was then performed by two independent groups of reviewers on 1000 randomly selected records Based on the above statistics, strategies for effective information extraction and privacy protection were proposed.

RESULTS

The data network covers hospitalized patients from 19 hospitals in 9 provinces of China, with a total number of 7,084,339 and a time span of 10 years (2010~2020). 688,268 female patients with sensitive reproductive information (SRI) were identified. The frequencies of the identification were variable, with the marriage history in admission medical records at 62.74% as the highest part. Surprisingly, more than 50% of female patients were identified with pregnancy and gestation history in nursing records, which is not generally considered a sub-dataset rich in reproductive information. In the manual curation and review process, 500 cases were selected randomly. The precision and recall rate of information extraction method both exceeded 99.5%. The privacy-protection strategies were designed with clear technical directions.

CONCLUSIONS

Critical information related to women’s health is recorded in a vast amount in Chinese routine EMR systems and it is distributed in different parts of the records with different frequencies, requiring a thorough protocol to extract and protect the information, which has been demonstrated technically feasible. Implementing a data-based strategy will help enforce the protection of women’s privacy and improve the accessibility of healthcare services.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3