Readability and topics of the German Health Web: Exploratory study and text analysis-Reference-Cited by-同舟云学术

Readability and topics of the German Health Web: Exploratory study and text analysis

Published:2023-02-10 Issue:2 Volume:18 Page:e0281582
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Zowalla Richard^ORCID,Pfeifer Daniel,Wetter Thomas

Abstract

Background The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user’s health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels. Objective In previous work, we showed the use of a focused crawler to “capture” and describe a large sample of the “German Health Web”, which we call the “Sampled German Health Web” (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) “.de”, “.at” and “.ch”. Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW’s graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW. Methods Important web sites were identified by applying PageRank on the sGHW’s graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier. Results In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: “Research & Science”, “Illness & Injury”, “The State”, “Healthcare structures”, “Diet & Food”, “Medical Specialities”, “Economy”, “Food production”, “Health communication”, “Family” and “Other”. The most prevalent themes were “Research & Science” and “Illness & Injury” accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience. Conclusions We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference86 articles.

1. Consumer health information seeking on the Internet: the state of the art;RJW Cline;Health Educ Res,2001

2. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews;G Eysenbach;BMJ,2002

3. Fox S, Duggan M. Health Online 2013 [Internet]. 2013. https://www.pewinternet.org/2013/01/15/health-online-2013/

4. Wetter T. Consumer Health Informatics New Services, Roles, and Responsibilities. Cham: Springer International Publishing; 2016. ISBN:978-3-319-19590-2

5. Health information seeking in the digital age: An analysis of health information seeking behavior among US adults;W Jacobs;Cogent Soc Sci,2017

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mangelnde Verständlichkeit durch Fachsprache in Gesundheitsinformationen zu chronischen Erkrankungen – eine qualitative Korpusanalyse;Prävention und Gesundheitsförderung;2024-05-23

2. Easy Camelot – ein Weg zu leicht verständlichen PDF-Dateien;Verwaltungskommunikation;2024

3. Document Difficulty Aspects for Medical Practitioners: Enhancing Information Retrieval in Personalized Search Engines;Applied Sciences;2023-09-23