Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data-Reference-Cited by-同舟云学术

Testing the validity of Wikipedia categories for subject matter labelling of open-domain corpus data

Published:2020-12-03 Issue: Volume: Page:016555152097743
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Aghaebrahimian Ahmad¹,Stauder Andy¹,Ustaszewski Michael¹^ORCID

Affiliation:

1. Department of Translation Studies, University of Innsbruck, Austria

Abstract

The Wikipedia category system was designed to enable browsing and navigation of Wikipedia. It is also a useful resource for knowledge organisation and document indexing, especially using automatic approaches. However, it has received little attention as a resource for manual indexing. In this article, a hierarchical taxonomy of three-level depth is extracted from the Wikipedia category system. The resulting taxonomy is explored as a lightweight alternative to expert-created knowledge organisation systems (e.g. library classification systems) for the manual labelling of open-domain text corpora. Combining quantitative and qualitative data from a crowd-based text labelling study, the validity of the taxonomy is tested and the results quantified in terms of interrater agreement. While the usefulness of the Wikipedia category system for automatic document indexing is documented in the pertinent literature, our results suggest that at least the taxonomy we derived from it is not a valid instrument for manual subject matter labelling of open-domain text corpora.

Funder

Austrian Academy of Sciences

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551520977438

Reference31 articles.

1. Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus

2. EVOLUTION OF WIKIPEDIA'S CATEGORY STRUCTURE

3. Wikipedia categories in research: towards a qualitative review of uses and applications

4. Functionalities for automatic metadata generation applications: a survey of metadata experts' opinions

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing knowledge organization systems from a gender perspective: Wikipedia taxonomy and Wikidata ontologies;Journal of Documentation;2024-04-05

2. Wikinformetrics: Construction and description of an open Wikipedia knowledge graph data set for informetric purposes;Quantitative Science Studies;2022

3. Language-agnostic Topic Classification for Wikipedia;Companion Proceedings of the Web Conference 2021;2021-04-19