Semantics-Based Document Categorization Employing Semi-Supervised Learning-Reference-Cited by-同舟云学术

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Published: Issue: Volume: Page:112-140
ISSN:2372-109X
Container-title:Advances in Linguistics and Communication Studies
language:
Short-container-title:

Author:

Žižka Jan¹,Dařena František¹

Affiliation:

1. Mendel University in Brno, Czech Republic

Abstract

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.

Publisher

IGI Global

Reference32 articles.

1. MOA: Massive online analysis.;A.Bifet;Journal of Machine Learning Research,2010

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text Mining;Advances in Data Mining and Database Management;2017