Affiliation:
1. Universiti Kebangsaan Malaysia, Malaysia
Abstract
Boosting algorithms have received significant attention over the past several years and are considered to be the state-of-the-art classifiers for multi-label classification tasks. The disadvantage of using boosting algorithms for text categorization (TC) is the vast number of features that are generated using the traditional Bag-of-Words (BOW) text representation, which dramatically increases the computational complexity. In this paper, an alternative text representation method using topic modeling for enhancing and accelerating multi-label boosting algorithms is concerned. An extensive empirical experimental comparison of eight multi-label boosting algorithms using topic-based and BOW representation methods was undertaken. For the evaluation, three well-known multi-label TC datasets were used. Furthermore, to justify boosting algorithms performance, three well-known instance-based multi-label algorithms were involved in the evaluation. For completely credible evaluations, all algorithms were evaluated using their native software tools, except for data formats and user settings. The experimental results demonstrated that the topic-based representation significantly accelerated all algorithms and slightly enhanced the classification performance, especially for near-balanced and balanced datasets. For the imbalanced dataset, BOW representation led to the best performance. The MP-Boost algorithm is the most efficient and effective algorithm for imbalanced datasets using BOW representation. For topic-based representation, AdaBoost.MH with meta base learners, Hamming Tree (AdaMH-Tree) and Product (AdaMH-Product) achieved the best performance; however, with respect to the computational time, these algorithms are the slowest overall. Moreover, the results indicated that topic-based representation is more significant for instance-based algorithms; nevertheless, boosting algorithms, such as MP-Boost, AdaMH-Tree and AdaMH-Product, notably exceed their performance.
Subject
Library and Information Sciences,Information Systems
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献