Automatic Extraction of Medication Mentions from Tweets—Overview of the BioCreative VII Shared Task 3 Competition

Author:

Weissenbacher Davy1ORCID,O’Connor Karen2ORCID,Rawal Siddharth2ORCID,Zhang Yu3ORCID,Tsai Richard Tzong-Han345ORCID,Miller Timothy67ORCID,Xu Dongfang67ORCID,Anderson Carol8,Liu Bo8,Han Qing9ORCID,Zhang Jinfeng9ORCID,Kulev Igor10ORCID,Köprü Berkay10ORCID,Rodriguez-Esteban Raul11ORCID,Ozkirimli Elif10ORCID,Ayach Ammer12,Roller Roland12ORCID,Piccolo Stephen13ORCID,Han Peijin14ORCID,Vydiswaran V G Vinod1516ORCID,Tekumalla Ramya17ORCID,Banda Juan M17ORCID,Bagherzadeh Parsa18,Bergler Sabine18ORCID,Silva João F19,Almeida Tiago1920ORCID,Martinez Paloma21ORCID,Rivera-Zavala Renzo21ORCID,Wang Chen-Kai2223,Dai Hong-Jie24ORCID,Alberto Robles Hernandez Luis17,Gonzalez-Hernandez Graciela1ORCID

Affiliation:

1. Department of Computational Biomedicine, Cedars-Sinai Medical Center , Los Angeles, CA, USA

2. DBEI, The Perelman School of Medicine, University of Pennsylvania , Philadelphia, PA, USA

3. Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd, Zhongli District, Taoyuan 320, Taiwan

4. IoX Center, National Taiwan University , Da’an District, Section 4, Roosevelt Rd, No. 1, Barry Lam Hall, Taipei 106, Taiwan

5. Research Center for Humanities and Social Sciences, Academia Sinica , No. 128, Section 2, Academia Rd, Nangang District, Taipei 115, Taiwan

6. Computational Health Informatics Program, Boston Children’s Hospital , Boston, MA, USA

7. Department of Pediatrics, Harvard Medical School , Boston, MA, USA

8. NVIDIA, Santa Clara , CA, USA

9. Department of Statistics, Florida State University , Tallahassee, FL, USA

10. Data and Analytics Chapter, F. Hoffmann-La Roche Ltd , Switzerland

11. Pharmaceutical Research and Early Development, Roche Innovation Center Basel , Switzerland

12. Speech and Language Technology Lab, DFKI , Berlin, Germany

13. Department of Biology, Brigham Young University , Provo, UT, USA

14. Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan , Ann Arbor, MI, USA

15. Department of Learning Health Sciences, Medical School, University of Michigan , Ann Arbor, MI, USA

16. School of Information, University of Michigan , Ann Arbor, MI, USA

17. Department of Computer Science, Georgia State University , Atlanta, GA, USA

18. CLaC Labs, Concordia University , Montreal, Canada

19. DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro , Portugal

20. Department of Computation, University of A Coruña , Spain

21. Computer Science and Engineering Department, Universidad Carlos III de Madrid , Madrid, Spain

22. Big Data Laboratory, Chunghwa Telecom Laboratories , Taoyuan, Taiwan

23. Department of Computer Science, National Yang Ming Chiao Tung University , Hsinchu, Taiwan

24. Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology , Kaohsiung, Taiwan

Abstract

Abstract This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user’s publicly available tweets (the user’s ‘timeline’). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user’s timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user’s timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.

Funder

National Library of Medicine

Publisher

Oxford University Press (OUP)

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Information Systems

Reference63 articles.

1. Deep neural networks ensemble for detecting medication mentions in tweets;Weissenbacher;J. Am. Med. Inform. Assoc.,2019

2. Exploring brand-name drug mentions on twitter for pharmacovigilance;Carbonell;Stud. Health Technol. Inform.,2015

3. A corpus for mining drug-related knowledge from Twitter chatter: language models and their utilities;Sarker;Data Brief,2017

4. Twimed: Twitter and PubMed comparable corpus of drugs, diseases, symptoms, and their relations;Alvaro;JMIR Public Health and surveillance,2017

5. Ontology-based healthcare named entity recognition from twitter messages using a recurrent neural network approach;Batbaatar;Int. J. Environ. Res. Public Health,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3