LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization-Reference-Cited by-同舟云学术

LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization

Published:2014-10-03 Issue:1 Volume:41 Page:27-40
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Al-Salemi Bassam¹,Ab Aziz Mohd. Juzaiddin¹,Noah Shahrul Azman¹

Affiliation:

1. National University of Malaysia, Malaysia

Abstract

AdaBoost.MH is a boosting algorithm that is considered to be one of the most accurate algorithms for multilabel classification. It works by iteratively building a committee of weak hypotheses of decision stumps. To build the weak hypotheses, in each iteration, AdaBoost.MH obtains the whole extracted features and examines them one by one to check their ability to characterize the appropriate category. Using Bag-Of-Words for text representation dramatically increases the computational time of AdaBoost.MH learning, especially for large-scale datasets. In this paper we demonstrate how to improve the efficiency and effectiveness of AdaBoost.MH using latent topics, rather than words. A well-known probabilistic topic modelling method, Latent Dirichlet Allocation, is used to estimate the latent topics in the corpus as features for AdaBoost.MH. To evaluate LDA-AdaBoost.MH, the following four datasets have been used: Reuters-21578-ModApte, WebKB, 20-Newsgroups and a collection of Arabic news. The experimental results confirmed that representing the texts as a small number of latent topics, rather than a large number of words, significantly decreased the computational time of AdaBoost.MH learning and improved its performance for text categorization.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551514551496

Reference24 articles.

1. Boosting and Additive Trees

2. Boosting for Text Classification with Semantic Features

3. A desicion-theoretic generalization of on-line learning and an application to boosting

4. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TextNetTopics-SFTS-SBTS: TextNetTopics Scoring Approaches Based Sequential Forward and Backward;Lecture Notes in Computer Science;2024

2. A game model and numerical simulation of risk communication in metro emergencies under the influence of emotions;International Journal of Disaster Risk Reduction;2023-10

3. Assessment of the Quality of Topic Models for Information Retrieval Applications;Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval;2023-08-09

4. How online reviews with different influencing factors affect the diffusion of new products;International Journal of Consumer Studies;2023-03-06

5. A Semantic Embedding Enhanced Topic Model For User-Generated Textual Content Modeling In Social Ecosystems;The Computer Journal;2022-10-01