Extracting numerical data from unstructured Arabic texts (ENAT)
-
Published:2021-03-10
Issue:3
Volume:21
Page:1759
-
ISSN:2502-4760
-
Container-title:Indonesian Journal of Electrical Engineering and Computer Science
-
language:
-
Short-container-title:IJEECS
Author:
K. AL-Mashhadany Abeer,N. Hamood Dalal,Sadiq Al-Obaidi Ahmed T.,K. Al-Mashhsdany Waleed
Abstract
<span id="docs-internal-guid-5dcc170c-7fff-e8e4-10d4-4a07701ca923"><span>Unstructured data becomes challenges because in recent years have observed the ability to gather a massive amount of data from annotated documents. This paper interested with Arabic unstructured text analysis. Manipulating unstructured text and converting it into a form understandable by computer is a high-level aim. An important step to achieve this aim is to understand numerical phrases. This paper aims to extract numerical data from Arabic unstructured text in general. This work attempts to recognize numerical characters phrases, analyze them and then convert them into integer values. The inference engine is based on the Arabic linguistic and morphological rules. The applied method encompasses rules of numerical nouns with Arabic morphological rules, in order to achieve high accurate extraction method. Arithmetic operations are applied to convert the numerical phrase into integer value. The proper operation is determined depending on linguistic and morphological rules. It will be shown that applying Arabic linguistic rules together with arithmetic operations succeeded in extracting numerical data from Arabic unstructured text with high accuracy reaches to 100%.</span></span>
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Information Systems,Signal Processing
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献