Affiliation:
1. Vishwakarma Government Engineering College, Gujarat, India
2. The Maharaja Sayajirao University of Baroda, Gujarat, India
Abstract
A topic model is one of the best stochastic models for summarizing an extensive collection of text. It has accomplished an inordinate achievement in text analysis as well as text summarization. It can be employed to the set of documents that are represented as a bag-of-words, without considering grammar and order of the words. We modeled the topics for Gujarati news articles corpus. As the Gujarati language has a diverse morphological structure and inflectionally rich, Gujarati text processing finds more complexity. The size of the vocabulary plays an important role in the inference process and quality of topics. As the vocabulary size increases, the inference process becomes slower and topic semantic coherence decreases. If the vocabulary size is diminished, then the topic inference process can be accelerated. It may also improve the quality of topics. In this work, the list of suffixes has been prepared that encounters too frequently with words in Gujarati text. The inflectional forms have been reduced to the root words concerning the suffixes in the list. Moreover, Gujarati single-letter words have been eliminated for faster inference and better quality of topics. Experimentally, it has been proved that if inflectional forms are reduced to their root words, then vocabulary length is shrunk to a significant extent. It also caused the topic formation process quicker. Moreover, the inflectional forms reduction and single-letter word removal enhanced the interpretability of topics. The interpretability of topics has been assessed on semantic coherence, word length, and topic size. The experimental results showed improvements in the topical semantic coherence score. Also, the topic size grew notably as the number of tokens assigned to the topics increased.
Publisher
Association for Computing Machinery (ACM)
Reference54 articles.
1. Juhi Ameta Nisheeth Joshi and Iti Mathur. 2012. A lightweight stemmer for Gujarati. arXiv:1210.5486). Retrieved from https://arxiv.org/abs/1210.5486. Juhi Ameta Nisheeth Joshi and Iti Mathur. 2012. A lightweight stemmer for Gujarati. arXiv:1210.5486). Retrieved from https://arxiv.org/abs/1210.5486.
2. Probabilistic topic models
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献