Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study (Preprint)

Author:

Bayani AzadehORCID,Ayotte AlexandreORCID,Nikiema Jean NoelORCID

Abstract

BACKGROUND

An increasing number of users are turning to web-based sources as an important source of health care guidance information. Thus, trustworthy sources of information should be automatically identifiable using objective criteria.

OBJECTIVE

The purpose of this study was to automate the assessment of the Health on the Net Foundation Code of Conduct (HONcode) criteria, enhancing our ability to pinpoint trustworthy health information sources.

METHODS

A data set of 538 web pages displaying health content was collected from 43 health-related websites. HONcode criteria have been considered as web page and website levels. For the website-level criteria (confidentiality, transparency, financial disclosure, and advertising policy), a bag of keywords has been identified to assess the criteria using a rule-based model. For the web page–level criteria (authority, complementarity, justifiability, and attribution) several machine learning (ML) approaches were used. In total, 200 web pages were manually annotated until achieving a balanced representation in terms of frequency. In total, 3 ML models—random forest, support vector machines (SVM), and Bidirectional Encoder Representations from Transformers (BERT)—were trained on the initial annotated data. A second step of training was implemented for the complementarity criterion using the BERT model for multiclass classification of the complementarity sentences obtained by annotation and data augmentation (positive, negative, and noncommittal sentences). Finally, the remaining web pages were classified using the selected model and 100 sentences were randomly selected for manual review.

RESULTS

For web page–level criteria, the random forest model showed a good performance for the attribution criterion while displaying subpar performance in the others. BERT and SVM had a stable performance across all the criteria. BERT had a better area under the curve (AUC) of 0.96, 0.98, and 1.00 for neutral sentences, justifiability, and attribution, respectively. SVM had the overall better performance for the classification of complementarity with the AUC equal to 0.98. Finally, SVM and BERT had an equal AUC of 0.98 for the authority criterion. For the website level criteria, the rule-based model was able to retrieve web pages with an accuracy of 0.97 for confidentiality, 0.82 for transparency, and 0.51 for both financial disclosure and advertising policy. The final evaluation of the sentences determined 0.88 of precision and the agreement level of reviewers was computed at 0.82.

CONCLUSIONS

Our results showed the potential power of automating the HONcode criteria assessment using ML approaches. This approach could be used with different types of pretrained models to accelerate the text annotation, and classification and to improve the performance in low-resource cases. Further work needs to be conducted to determine how to assign different weights to the criteria, as well as to identify additional characteristics that should be considered for consolidating these criteria into a comprehensive reliability score.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3