Abstract
AbstractIn this era of information and communication technology, a large population relies on the Internet to gather information. One of the most popular information sources on the Internet is Wikipedia. Wikipedia is a free encyclopedia that provides a wide range of information to its users. However, there have been concerns about the readability of information on Wikipedia time and again. The readability of the text is defined as the ease of understanding the underlying text. Past studies have analyzed the readability of Wikipedia articles with the help of conventional readability metrics, such as the Flesch-Kincaid readability score and the Automatic Readability Index (ARI). Such metrics only consider the surface-level parameters, such as the number of words, sentences, and paragraphs in the text, to quantify the readability. However, the readability of the text must also take into account the quality of the text. In this study, we consider many new NLP-based parameters capturing the quality of the text, such as lexical diversity, semantic diversity, lexical complexity, and semantic complexity and analyze their impact on the readability of Wikipedia articles using artificial neural networks. Besides NLP parameters, the crowdsourced parameters also affect the readability, and therefore, we also analyze the impact of crowdsourced parameters and observe that the crowdsourced parameters not only influence the readability scores but also affect the NLP parameters of the text. Additionally, we investigate the mediating effect of NLP parameters that connect the crowdsourced parameters to the readability of the text. The results show that the impact of crowdsourced parameters on readability is partially due to the profound effect of NLP-based parameters.
Publisher
Springer Science and Business Media LLC
Reference82 articles.
1. Zuffi S, Brambilla C, Beretta G, Scala P (2007) Human computer interaction: Legibility and contrast. In: 14th International conference on image analysis and processing (ICIAP 2007), IEEE, pp 241–e246
2. Alexa (2019) Wikipedia.org Traffic, Demographics and Competitors. https://www.alexa.com/siteinfo/wikipedia.org
3. Swartz A (2006) Who writes wikipedia. Raw thought 4
4. Setia S, Iyengar S, Verma AA, Dubey N (2021) Is wikipedia easy to understand?: A study beyond conventional readability metrics. In: International Conference on Computational Collective Intelligence, Springer, pp 175–e187
5. Gregori-Signes C, Clavel-Arroitia B (2015) Analysing lexical density and lexical diversity in university students’ written discourse. Procedia Soc Behav Sci 198:546-e556