A Novel Deep Auto-Encoder Based Linguistics Clustering Model for Social Text-Reference-Cited by-同舟云学术

A Novel Deep Auto-Encoder Based Linguistics Clustering Model for Social Text

Published:2022-04-29 Issue: Volume: Page:
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Akram Muhammad Waseem¹^ORCID,Salman Muhammad²^ORCID,Bashir Muhammad Farrukh³^ORCID,Salman Syed Muhammad Saad⁴^ORCID,Gadekallu Thippa Reddy⁵^ORCID,Javed Abdul Rehman⁶^ORCID

Affiliation:

1. COMSATS University Islamabad, Pakistan

2. The Australian National University, Australia

3. Riphah International University, Pakistan

4. National University of Computer and Emerging Sciences, Pakistan

5. School of Information Technology and Engineering Vellore Institute of Technology, India

6. Department of Cyber Security Air University, Pakistan

Abstract

The wide adoption of media and social media has increased the amount of digital content to an enormous level. Natural language processing (NLP) techniques provide an opportunity to extract and explore meaningful information from a large amount of text. Among natural languages, Urdu is one of the widely used languages worldwide for spoken and written communications. Due to its wide adopt-ability, digital content in the Urdu language is increasing briskly, especially with social media and online NEWS feeds. Government agencies and advertisers must filter and understand the content to analyze the trends and cohorts in their interest and national prerogative. Clustering is considered a baseline and one of the first steps in natural language understanding. There are many state-of-the-art clustering techniques specifically for English, French, and Arabic, but no significant research has been conducted in Urdu language processing. Doing it for short text segments is challenging because of limited features and the absence of meaningful language discourse and nuance. Many rule-based NLP techniques are adopted to overcome these issues, relying on human-designed features and rules. Therefore, these methods do not promise remarkable results. Alongside NLP, deep learning techniques are pretty efficient in capturing contextual information with minimal noise compared to other traditional methods. By taking on this challenging job, we develop a deep learning-based technique for Urdu short text clustering for the very first time without a human-designed feature. In this paper, we propose a method of short text clustering using a deep neural network that automatically learns feature representations and clustering assignments simultaneously. This method learns clustering objectives by converting the high dimensional feature space to a low dimensional feature space. Our experiments on the Urdu NEWS headlines dataset show remarkable results compared to state-of-the-art methods.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3527838

Reference61 articles.

1. Morphologically rich Urdu grammar parsing using Earley algorithm

2. ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning

3. A new feature selection method to improve the document clustering using particle swarm optimization algorithm

4. Alan Agresti . 2018. An introduction to categorical data analysis . John Wiley & Sons . Alan Agresti. 2018. An introduction to categorical data analysis. John Wiley & Sons.

5. An intelligent web search framework for performing efficient retrieval of data

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. K-Means text clustering method based on Decision Grey Wolf Optimization;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-20

2. Neural Network Meaningful Learning Theory and its Application for Deep Text Clustering;IEEE Access;2024

3. Custom Dataset Text Classification: An Ensemble Approach with Machine Learning and Deep Learning Models;2023 3rd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA);2023-12-21

4. Automatic image captioning combining natural language processing and deep neural networks;Results in Engineering;2023-06

5. Automatic Text Summarization-Based Transformers Architecture;2023 International Conference on Information Technology, Applied Mathematics and Statistics (ICITAMS);2023-03-20