Learning to rank for multi-label text classification: Combining different sources of information-Reference-Cited by-同舟云学术

Learning to rank for multi-label text classification: Combining different sources of information

Published:2020-02-18 Issue:1 Volume:27 Page:89-111
ISSN:1351-3249
Container-title:Natural Language Engineering
language:en
Short-container-title:Nat. Lang. Eng.

Author:

Azarbonyad Hosein,Dehghani Mostafa,Marx Maarten,Kamps Jaap^ORCID

Abstract

AbstractEfficiently exploiting all sources of information such as labeled instances, classes’ representation, and relations of them has a high impact on the performance of Multi-Label Text Classification (MLTC) systems. Most of the current approaches use labeled documents as the primary source of information for MLTC. We investigate the effectiveness of different sources of information— such as the labeled training data, textual labels of classes, and taxonomy relations of classes— for MLTC. More specifically, first, for each document–class pair, different features are extracted using different sources of information. The features reflect the similarity of classes and documents. Then, MLTC is considered to be a ranking problem, and a learning to rank (LTR) approach is used for ranking classes regarding documents and selecting labels of documents. An important characteristic of many MLTC instances is that documents can belong to multiple classes and there are implicit relations between classes. We apply score propagation on top of LTR to incorporate co-occurrence patterns of classes in labeled documents. Our main findings are the following. First, using an LTR approach integrating all features, we observe significantly better performance than previous systems for MLTC. Specifically, we show that simple classification approaches fail when there is a high number of classes. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight into the underlying classification problem. Interestingly, the results indicate that the titles of documents are more informative than all other sources of information. Third, a lean-and-mean system using only four features is able to perform at 96% of the large LTR model that we propose in this paper. Fourth, using the co-occurrence information of classes helps in classifying documents more accurately. Our results show that the co-occurrence information is more helpful when the underlying classifier has a poor performance.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference52 articles.

1. Steinberger, R. , Ebrahim, M. and Turchi, M. (2012). JRC EuroVoc indexer JEX-A freely available multi-label categorisation tool. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC.

2. EuroVoc, . (2014). Multilingual thesaurus of the European Union. Available at http://eurovoc.europa.eu/

3. Daudaravicius, V. (2012). Automatic multilingual annotation of EU legislation with Eurovoc descriptors. In Proceedings of Exploring and Exploiting Official Publications Workshop Programme, EEOP2012, pp. 14–20.

4. Hierarchical multi-class text categorization with global margin maximization

5. Dynamic label propagation for semi-supervised multi-class multi-label classification

Cited by 24 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application of an Improved Convolutional Neural Network Algorithm in Text Classification;Journal of Web Engineering;2024-05-25

2. A Semantic-Based Framework for Multi-Label Text Classification;2024 2nd International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA);2024-05-24

3. An Efficient Optimized DenseNet Model for Aspect-Based Multi-Label Classification;Algorithms;2023-11-28

4. Identification effect of least square fitting method in archives management;Heliyon;2023-09

5. An Ontology Driven Machine Learning Applications in Public Policy Analysis: A Systematic Literature Review;2023-05-22