Abstract
In the absence of explicit punctuation, the Arabic language's semantic and contextual nature poses a unique challenge, necessitating the reintroduction of punctuation marks for elucidating sentence structure and meaning. We investigate the impact of sentence length on punctuation prediction in the context of Arabic language processing. Leveraging Deep Neural Networks (DNNs), specifically Bi-Directional Long Short-Term Memory (Bi-LSTM) models. Our study goes beyond restoration, aiming to accurately predict punctuation marks in unprocessed text. The investigation focuses on five primary punctuation marks (.?,: and !), contributing to a more comprehensive understanding of predicting diverse punctuation marks in Arabic texts and we have achieved 85 % in accuracy . This research not only advances our understanding of Arabic language processing but also serves as a broader exploration of the relationship between sentence length and punctuation prediction.
Publisher
Salud, Ciencia y Tecnologia
Reference28 articles.
1. Y. Wang, J. Deng, A. Sun, and X. Meng, “Perplexity from PLM Is Unreliable for Evaluating Text Quality.” arXiv, Mar. 15, 2023. Accessed: Dec. 26, 2023. [Online]. Available: http://arxiv.org/abs/2210.05892
2. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, Art. no. 11, 1997, doi: 10.1109/78.650093.
3. Ł. Augustyniak et al., “Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?,” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.05985
4. M. Bajec, M. Janković, S. Žitnik, and I. L. Bajec, “Punctuation Restoration System for Slovene Language,” in Research Challenges in Information Science, F. Dalpiaz, J. Zdravkovic, and P. Loucopoulos, Eds., Cham: Springer International Publishing, 2020, pp. 509–514.
5. International Association for Pattern Recognition, Zhongguo ke xue yuan, and Chinese Association of Automation, 2018 24th International Conference on Pattern Recognition (ICPR).