Author:
Ma Xinran,Long Lura,Moon Sharon,Adamson Blythe J.S.,Baxi Shrujal S.
Abstract
ABSTRACTBackground and ObjectiveThe Surveillance, Epidemiology, and End Results Program (SEER) program and the National Program of Cancer Registries (NPCR), are authoritative sources for population cancer surveillance and research in the US. An increasing number of recent oncology studies are based on the electronic health record (EHR)-derived de-identified databases created and maintained by Flatiron Health. This report describes the differences in the originating sources and data development processes, and compares baseline demographic characteristics in the cancer-specific databases from Flatiron Health, SEER, and NPCR, to facilitate interpretation of research findings based on these sources.MethodsPatients with documented care from January 1, 2011 through May 31, 2019 in a series of EHR-derived Flatiron Health de-identified databases covering multiple tumor types were included. SEER incidence data (obtained from the SEER 18 database) and NPCR incidence data (obtained from the US Cancer Statistics public use database) for malignant cases diagnosed from January 1, 2011 to December 31, 2016 were included. Comparisons of demographic variables were performed across all disease-specific databases, for all patients and for the subset diagnosed with advanced-stage disease.ResultsAs of May 2019, a total of 201,570 patients with 19 different cancer types were included in Flatiron Health datasets. In an overall comparison to national cancer registries, patients in the Flatiron Health databases had similar sex and geographic distributions, but appeared to be diagnosed with later stages of disease and their age distribution differs from the other datasets. For variables such as stage and race, Flatiron Health databases had a greater degree of incompleteness. There are variations in these trends by cancer types.ConclusionsThese three databases present general similarities in demographic and geographic distribution, but there are overarching differences across the populations they cover. Differences in data sourcing (medical oncology EHRs vs cancer registries), and disparities in sampling approaches and rules of data acquisition may explain some of these divergences. Furthermore, unlike the steady information flow entered into registries, the availability of medical oncology EHR-derived information reflects the extent of involvement of medical oncology clinics at different points in the specialty management of individual diseases, resulting in inter-disease variability. These differences should be considered when interpreting study results obtained with these databases.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. US Food and Drug Administration (b). Framework for FDA’s real-world evidence program. December 2018. Accessed at https://www.fda.gov/media/120060/download on December 23, 2019
2. Data rich, information poor: Can we use electronic health records to create a learning healthcare system for pharmaceuticals?;Clin Pharmacol Ther,2019
3. National Cancer Institute. Surveillance, Epidemiology, and End Results Program. Overview of the SEER Program. Accessed at https://seer.cancer.gov/about/overview.html on February 18, 2020
4. Center for Disease Control and Prevention. National Program of Cancer Registries. Accessed at https://www.cdc.gov/cancer/npcr/index.htm on February 18, 2020
5. Health Information Technology (HITECH Act). Accessed at https://www.healthit.gov/sites/default/files/hitech_act_excerpt_from_arra_with_index.pdf on February 18 2020
Cited by
160 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献