A Deep Learning Approach for the Romanized Tunisian Dialect Identification-Reference-Cited by-同舟云学术

A Deep Learning Approach for the Romanized Tunisian Dialect Identification

Published:2020-11-01 Issue:6 Volume:17 Page:935-946
ISSN:2309-4524
Container-title:The International Arab Journal of Information Technology
language:en
Short-container-title:IAJIT

Author:

Younes Jihene,Achour Hadhemi,Souissi Emna,Ferchichi Ahmed

Abstract

Language identification is an important task in natural language processing that consists in determining the language of a given text. It has increasingly picked the interest of researchers for the past few years, especially for code-switching informal textual content. In this paper, we focus on the identification of the Romanized user-generated Tunisian dialect on the social web. We segment and annotate a corpus extracted from social media and propose a deep learning approach for the identification task. We use a Bidirectional Long Short-Term Memory neural network with Conditional Random Fields decoding (BLSTM-CRF). For word embeddings, we combine word-character BLSTM vector representation and Fast Text embeddings that takes into consideration character n-gram features. The overall accuracy obtained is 98.65%.

Publisher

Zarqa University

Subject

General Computer Science

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Literature Review: NLP Techniques for Arabic Dialect Recognition;2024 International Conference on Circuit, Systems and Communication (ICCSC);2024-06-28

2. Application of Production-Oriented Approach and Deep Learning Model in English Translation Teaching Model;Mobile Information Systems;2022-10-04

3. Translation system from Tunisian Dialect to Modern Standard Arabic;Concurrency and Computation: Practice and Experience;2021-12-16

4. Building Bi-script Language Resources for the Tunisian Dialect’s NLP;Procedia Computer Science;2021