Summary Generation of Dengue Outbreaks from ProMED-mail Database using a Linguistic Pattern-infused Dual-channel BiLSTM (Preprint)

Author:

Chang Yung-Chun,Chiu Yu-Wen,Chuang Ting-WuORCID

Abstract

BACKGROUND

Globalization and environmental changes have increased the emergence and re-emergence of infectious diseases worldwide. The collaboration of regional infectious disease surveillance systems is critical but difficult to achieve because of the different transparency levels of health information sharing systems among countries. ProMED-mail is the most comprehensive expert-curated platform that provides rich outbreak information among humans, animals, and plants from different countries. However, owing to unstructured text content in reports, it is difficult to analyze them for further applications. Therefore, we have devised an idea to develop an automatic summary of the alerting articles from ProMED-mail. In this research, we propose a text summarization method that uses natural language processing to extract important sentences automatically from alert articles in ProMED emails to generate summaries of dengue outbreaks in Southeast Asia. Our method, can be used to capture crucial information quickly and make decisions for epidemic surveillance.

OBJECTIVE

To generate automatic summaries of unstructured text content from reports.

METHODS

Our materials come from the ProMED-mail website, spanning a period from 1994 to 2019. The collected data were annotated by professionals to establish a unique Taiwan dengue corpus through, which achieved almost perfect agreement (90% Cohen’s Kappa statistic). To generate a ProMED-mail summary, we developed a dual-channel bidirectional long-short term memory with an attention mechanism that infuses latent syntactic features to identify crucial sentences from the alerting articles.

RESULTS

Our method is superior to many well-known machine learning and neural network approaches in identifying important sentences, achieving a macro average F1-score of 93%. Moreover, the method can successfully extract key information about dengue fever outbreaks in ProMED-mail, and help researchers or public health practitioners to capture important summaries quickly. Besides verifying the model, we also recruited five professional experts and five students from related fields to carry out a satisfaction survey on the generated summary. The results showed that 83.6% of the summaries received high satisfaction ratings.

CONCLUSIONS

The proposed approach successfully fuses latent syntactic features into a deep neural network to analyze syntactic, semantic, and content information in the text. It then exploits the derived information to identify the crucial sentences in ProMED-mail. The experimental results show that the proposed method is effective and outperforms the comparisons. In addition, our method demonstrated the potential for summary generation from ProMED-mail. When a new alerting article arrives, public health decision makers can identify the outbreak information in a lengthy article quickly and deliver immediate responses to disease control and prevention.

CLINICALTRIAL

NA

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3