Discovering foodborne illness in online restaurant reviews

Author:

Effland Thomas1,Lawson Anna1,Balter Sharon2,Devinney Katelynn2,Reddy Vasudha2,Waechter HaeNa2,Gravano Luis1,Hsu Daniel1

Affiliation:

1. Computer Science Department, Data Science Institute, Columbia University, New York, NY, USA

2. Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, NY, USA

Abstract

Abstract Objective We developed a system for the discovery of foodborne illness mentioned in online Yelp restaurant reviews using text classification. The system is used by the New York City Department of Health and Mental Hygiene (DOHMH) to monitor Yelp for foodborne illness complaints. Materials and Methods We built classifiers for 2 tasks: (1) determining if a review indicated a person experiencing foodborne illness and (2) determining if a review indicated multiple people experiencing foodborne illness. We first developed a prototype classifier in 2012 for both tasks using a small labeled dataset. Over years of system deployment, DOHMH epidemiologists labeled 13 526 reviews selected by this classifier. We used these biased data and a sample of complementary reviews in a principled bias-adjusted training scheme to develop significantly improved classifiers. Finally, we performed an error analysis of the best resulting classifiers. Results We found that logistic regression trained with bias-adjusted augmented data performed best for both classification tasks, with F1-scores of 87% and 66% for tasks 1 and 2, respectively. Discussion Our error analysis revealed that the inability of our models to account for long phrases caused the most errors. Our bias-adjusted training scheme illustrates how to improve a classification system iteratively by exploiting available biased labeled data. Conclusions Our system has been instrumental in the identification of 10 outbreaks and 8523 complaints of foodborne illness associated with New York City restaurants since July 2012. Our evaluation has identified strong classifiers for both tasks, whose deployment will allow DOHMH epidemiologists to more effectively monitor Yelp for foodborne illness investigations.

Funder

National Science Foundation

Alfred P. Sloan Foundation

Centers for Disease Control and Prevention

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Reference15 articles.

1. E. Foodborne illness acquired in the United States: unspecified agents;Scallan;Emerg Infect Dis.,2011

2. Surveillance for foodborne disease outbreaks: United States, 1998–2008;Gould;MMWR Surveill Summ.,2013

3. Combining search, social media, and traditional data sources to improve influenza surveillance;Santillana;PLoS Comput Biol.,2015

4. Comparing timeliness, content, and disease severity of formal and informal source outbreak reporting;Bahk;BMC Infect Dis.,2015

5. HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports;Freifeld;J Am Med Inform Assoc.,2008

Cited by 37 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Survey of the Applications of Text Mining for the Food Domain;Algorithms;2024-04-25

2. UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection;2023 IEEE International Conference on Big Data (BigData);2023-12-15

3. Chinese Food Safety Entity Recognition Based on BERT Model;2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC);2023-11-17

4. A Novel Foodborne Illness Detection and Web Application Tool Based on Social Media;Foods;2023-07-20

5. How Can AI Help Improve Food Safety?;Annual Review of Food Science and Technology;2023-03-27

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3