Author:
Lemas Dominick J.,Du Xinsong,Rouhizadeh Masoud,Lewis Braeden,Frank Simon,Wright Lauren,Spirache Alex,Gonzalez Lisa,Cheves Ryan,Magalhães Marina,Zapata Ruben,Reddy Rahul,Xu Ke,Parker Leslie,Harle Chris,Young Bridget,Louis-Jaques Adetola,Zhang Bouri,Thompson Lindsay,Hogan William R.,Modave François
Abstract
AbstractThe objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother’s milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
Funder
National Institute of Diabetes and Digestive and Kidney Diseases
National Center for Advancing Translational Sciences
Publisher
Springer Science and Business Media LLC
Reference47 articles.
1. Meek, J.Y. & Noble, L. Section on breastfeeding. Policy statement: Breastfeeding and the use of human milk. Pediatrics 150(1), e2022057988 https://doi.org/10.1542/peds.2022-057988 (2022).
2. Pérez-Escamilla, R., Buccini, G. S., Segura-Pérez, S. & Piwoz, E. Perspective: Should exclusive breastfeeding still be recommended for 6 months?. Adv. Nutr. 10(6), 931–943. https://doi.org/10.1093/advances/nmz039 (2019).
3. World Health Organization. Infant and Young Child Feeding : Model Chapter for Textbooks for Medical Students and Allied Health Professionals. Vol. 99 (2009).
4. CDC. 2022 Breastfeeding Report Card. Centers for Disease Control and Prevention. https://www.cdc.gov/breastfeeding/data/reportcard.htm. Accessed 14 Aug 2023 (2023).
5. McCoy, M. B. & Heggie, P. In-hospital formula feeding and breastfeeding duration. Pediatrics 146(1), e20192946. https://doi.org/10.1542/peds.2019-2946 (2020).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献