Leveraging automated approaches to categorize birth defects from abstracted birth hospitalization data

Author:

Newton Suzanne M.1ORCID,Distler Samantha1,Woodworth Kate R.1,Chang Daniel2,Roth Nicole M.1ORCID,Board Amy1,Hutcherson Hailee3ORCID,Cragan Janet D.1,Gilboa Suzanne M.1,Tong Van T.1

Affiliation:

1. Division of Birth Defects and Infant Disorders Centers for Disease Control and Prevention Atlanta Georgia USA

2. Eagle Global Scientific, LLC San Antonio Texas USA

3. G2S Corporation San Antonio Texas USA

Abstract

AbstractBackgroundThe Surveillance for Emerging Threats to Pregnant People and Infants Network (SET‐NET) collects data abstracted from medical records and birth defects registries on pregnant people and their infants to understand outcomes associated with prenatal exposures. We developed an automated process to categorize possible birth defects for prenatal COVID‐19, hepatitis C, and syphilis surveillance. By employing keyword searches, fuzzy matching, natural language processing (NLP), and machine learning (ML), we aimed to decrease the number of cases needing manual clinician review.MethodsSET‐NET captures International Classification of Diseases, 10th Revision, Clinical Modification (ICD‐10‐CM) codes and free text describing birth defects. For unstructured data, we used keyword searches, and then conducted fuzzy matching with a cut‐off match score of ≥90%. Finally, we employed NLP and ML by testing three predictive models to categorize birth defect data.ResultsAs of June 2023, 8326 observations containing data on possible birth defects were submitted to SET‐NET. The majority (n = 6758 [81%]) were matched to an ICD‐10‐CM code and 1568 (19%) were unable to be matched. Through keyword searches and fuzzy matching, we categorized 1387/1568 possible birth defects. Of the remaining 181 unmatched observations, we correctly categorized 144 (80%) using a predictive model.ConclusionsUsing automated approaches allowed for categorization of 99.6% of reported possible birth defects, which helps detect possible patterns requiring further investigation. Without employing these analytic approaches, manual review would have been needed for 1568 observations. These methods can be employed to quickly and accurately sift through data to inform public health responses.

Funder

Centers for Disease Control and Prevention

Publisher

Wiley

Subject

Health, Toxicology and Mutagenesis,Developmental Biology,Toxicology,Embryology,Pediatrics, Perinatology and Child Health

Reference12 articles.

1. Centers for Disease Control and Prevention. (2023a).International classification of diseases tenth revision clinical modification (ICD‐10‐CM).https://www.cdc.gov/nchs/icd/icd-10-cm.htm

2. Centers for Disease Control and Prevention. (2023b).SET‐NET GitHub repository.https://github.com/cdcgov/SET-NET

3. Centers for Disease Control and Prevention Emory University Georgia Mental Health Institute. (n.d.).Metropolitan Atlanta congenital defects program (MACDP).https://www.cdc.gov/ncbddd/birthdefects/macdp.html

4. Cohen A.(2020).FuzzyWuzzy (Version 0.18.0).https://pypi.org/project/fuzzywuzzy/

5. Scikit‐learn: Machine learning in python;Fabian Pedregosa G. V.;Journal of Machine Learning Research,2011

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3