Automated fact-checking of online health-related information: a novel approach (Preprint)

Author:

Bayani Azadeh,Ayotte Alexandre,Nikiema Jean NoelORCID

Abstract

BACKGROUND

Many people seek health-related information online. The significance of reliable information became particularly evident due to the potential dangers of misinformation. Therefore, discerning true and reliable information from false information has become increasingly challenging.

OBJECTIVE

In the present study, we introduced a novel approach to automate the fact-checking process, leveraging PubMed resources as a source of truth employing Natural Language Processing (NLP) transformer models to enhance the process.

METHODS

A total of 538 health-related webpages, covering seven different disease subjects, were manually selected by Factually Health Company. The process included the following steps: i) using a Bidirectional Encoder Representations from Transformers (BERT) model, the contents of webpages were classified into three thematic categories: semiology, epidemiology, and management. ii) for each category in the webpages, a PubMed query was automatically produced using a combination of the “WellcomeBertMesh” and “KeyBERT” models, iii) top 20 related literatures were automatically extracted from PubMed and finally, iv) the similarity checking techniques of Cosine similarity and Jaccard distance were applied to compare the content of extracted literature and webpages.

RESULTS

The BERT model for categorization of webpages contents had a good performance with the F1-scores and recall of 93% and 94% for the semiology and epidemiology respectively and 96% of for both the recall and F1-score for management. For each of the three categories in a webpage, one PubMed query was generated and with each query, 20 most related, open access and within the category of systematic reviews and meta-analysis were extracted. Less than 10% of the extracted literature were irrelevant, which were deleted. For each webpage, an average number of 23% of the sentences found to be very similar to the literature. Moreover, during the evaluation, it was found that Cosine similarity outperformed the Jaccard Distance measure when comparing the similarity between sentences from web pages and academic papers vectorized by BERT. However, there was a significant issue with false positives in the retrieved sentences when compared to accurate similarities as some sentences had a similarity score exceeding 80%, but they could not be considered as similar sentences.

CONCLUSIONS

In the present research, we have proposed an approach to automate the fact-checking of health-related online information. Incorporating content from PubMed or other scientific article databases as trustworthy resources can automate the discovery of similarly credible information in the health domain

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3