Affiliation:
1. Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar University, Ethiopia
Abstract
Language identification and content detection are essential for ensuring effective digital communication, and content moderation. While extensive research has primarily focused on well-known and widely spoken languages, challenges persist when dealing with indigenous and resource-limited languages, especially between closely similar languages such as Ethiopian languages. This article aims to simultaneously identify the language of a given text and detect its content, and to achieve this, we propose a novel attention-based recurrent neural network framework. The proposed method has an attention-embedded Bidirectional-LSTM architecture with two classifiers that identify the language of a given text and content within the text. The two classifiers share a common feature space before they branched at their task-specific layers where both layers are assisted by attention mechanism. We use five different topics in Six Ethiopian Languages the dataset consists of nearly 22,624 sentences. We compared our result with the classical NLP techniques, the proposed method shortened the data prepossessing steps. We evaluated the model performance using the accuracy metric, achieving results of 98.88% for language identification and 96.5% for text content detection. The dataset, source code, and pretrained model are available at
https://github.com/bdu-birhanu/LID_TCD
.
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. Solomon Teferra Abate Michael Melese Martha Yifiru Tachbelie Million Meshesha Solomon Atinafu Wondwossen Mulugeta Yaregal Assabie Hafte Abera Binyam Ephrem Seyoum Tewodros Abebe et al. 2018. Parallel corpora for bi-directional statistical machine translation for seven Ethiopian language pairs. In Proceedings of the 1st Workshop on Linguistic Resources for Natural Language Processing . 83–90.
2. Learning from hints in neural networks
3. Ahmed Sulaiman M. Alharbi and Elise de Doncker. 2019. Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Elsevier Cognitive Systems Research 54 (2019) 50–61.
4. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
5. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2007. Multi-task feature learning. In Proceedings of the Advances in Neural Information Processing Systems. 41–48.