ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)-Reference-Cited by-同舟云学术

ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

Published:2020-01-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Yang Feihong,Li Jiao

Abstract

BACKGROUND

Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various natural language expression due to the lack of medical knowledge. It’s challenging to match a new question to previous similar questions for answering. In health QA system development, question matching (QM) is a task to judge whether a pair of questions express the same meaning and is used to map the answer of matched question in the given question-answering database. BERT (i.e. Bidirectional Encoder Representations from Transformers) is proved to be state-of- the-art model in natural language processing (NLP) tasks, such as binary classification and sentence matching. As a light model of BERT, ALBERT is proposed to address the huge parameters and low training speed problems of BERT. Both of BERT and ALBERT can be used to address the QM problem.

OBJECTIVE

In this study, we aim to develop an ALBERT based method for Chinese health related question matching.

METHODS

Our proposed method, named as ALBERT-QM, consists of three components. (1)Data augmenting. Similar health question pairs were augmented for training preparation. (2)ALBERT model training. Given the augmented training pairs, three ALBERT models were trained and fine-tuned. (3)Similarity combining. Health question similarity score were calculated by combining ALBRT model outputs with text similarity. To evaluate our ALBERT-QM performance on similar question identification, we used an open dataset with 20,000 labeled Chinese health question pairs.

RESULTS

Our ALBERT-QM is able to identify similar Chinese health questions, achieving the precision of 86.69%, recall of 86.70% and F1 of 86.69%. Comparing with baseline method (text similarity algorithm), ALBERT-QM enhanced the F1-score by 20.73%. Comparing with other BERT series models, our ALBERT-QM is much lighter with the files size of 64.8MB which is 1/6 times that other BERT models. We made our ALBERT-QM open accessible at https://github.com/trueto/albert_question_match.

CONCLUSIONS

In this study, we developed an open source algorithm, ALBERT-QM, contributing to similar Chinese health questions identification in a health QA system. Our ALBERT-QM achieved better performance in question matching with lower memory usage, which is beneficial to the web-based or mobile-based QA applications.

Publisher

JMIR Publications Inc.

Reference16 articles.

1. ComQA: Question Answering Over Knowledge Base via Semantic Matching

2. SimQ: Real-Time Retrieval of Similar Consumer Health Questions

3. Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet

4. Qcorp: an annotated classification corpus of Chinese health questions

5. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies