ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

Author:

Yang Feihong,Li Jiao

Abstract

BACKGROUND

Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge. It’s challenging to match a new question to previous similar questions for answering. In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database. BERT (i.e. Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching. As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT. Both of BERT and ALBERT can be used to address the QM problem.

OBJECTIVE

In this study, we aim to develop an ALBERT based method for Chinese health related question matching.

METHODS

Our proposed method, named as ALBERT-QM, consists of three components. (1)Data augmenting. Similar health question pairs were augmented for training preparation. (2)ALBERT model training. Given the augmented training pairs, three ALBERT models were trained and fine-tuned. (3)Similarity combining. Health question similarity score were calculated by combining ALBRT model outputs with text similarity. To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs.

RESULTS

Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.69%, recall of 86.70% and F1 of 86.69%. Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.73%. Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.8MB which is 1/6 times that other BERT models. We made our ALBERT-QM open accessible at https://github.com/trueto/albert_question_match.

CONCLUSIONS

In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system. Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3