Abstract
AbstractBackgroundThe secondary use of electronic health records (EHRs) promises to facilitate medical research. We reviewed general data requirements in observational studies and analyzed the feasibility of conducting observational studies with structured EHR data, in particular diagnosis and procedure codes.MethodsAfter reviewing published observational studies from the University Hospital of Erlangen for general data requirements, we identified three different study populations for the feasibility analysis with eligibility criteria from three exemplary observational studies. For each study population, we evaluated the availability of relevant patient characteristics in our EHR, including outcome and exposure variables. To assess data quality, we computed distributions of relevant patient characteristics from the available structured EHR data and compared them to those of the original studies. We implemented computed phenotypes for patient characteristics where necessary. In random samples, we evaluated how well structured patient characteristics agreed with a gold standard from manually interpreted free texts. We categorized our findings using the four data quality dimensions “completeness”, “correctness”, “currency” and “granularity”.ResultsReviewing general data requirements, we found that some investigators supplement routine data with questionnaires, interviews and follow-up examinations. We included 847 subjects in the feasibility analysis (Study 1n = 411, Study 2n = 423, Study 3n = 13). All eligibility criteria from two studies were available in structured data, while one study required computed phenotypes in eligibility criteria. In one study, we found that all necessary patient characteristics were documented at least once in either structured or unstructured data. In another study, all exposure and outcome variables were available in structured data, while in the other one unstructured data had to be consulted. The comparison of patient characteristics distributions, as computed from structured data, with those from the original study yielded similar distributions as well as indications of underreporting. We observed violations in all four data quality dimensions.ConclusionsWhile we found relevant patient characteristics available in structured EHR data, data quality problems may entail that it remains a case-by-case decision whether diagnosis and procedure codes are sufficient to underpin observational studies. Free-text data or subsequently supplementary study data may be important to complement a comprehensive patient history.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Cited by
22 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献