Affiliation:
1. University of Regina, Canada
Abstract
Most online platforms, applications, and Websites use a massive amount of heterogeneous evolving data. These data must be structured and normalized before integration to improve the search and increase the relevance of results. An ontology can address this critical task by efficiently managing data and providing structured formats through techniques such as the Web Ontology Language (OWL). However, building an ontology can be costly, primarily if conducted manually. In this context, we propose a new methodology for automatically building and learning a multilingual ontology using Arabic as the base language via a corpus collected from Wikipedia. Our proposed methodology relies on Finite-state transducers (FSTs). FSTs are regrouped into a cascade to reduce errors and minimize ambiguity. The produced ontology is extended to English and French and independent language images via a translator we developed using APIs. The rationale for starting with the Arabic corpus to extract terms is that entity linking is more convenient from Arabic to other languages. In addition, many Wikipedia articles in English and French (for instance) do not have associated Arabic articles, but the opposite is true. In addition, dealing with Arabic terms permits us to enrich the Arabic module of the free linguistic platform we use in dictionaries and graphs. To assess the efficiency of our proposed methodology, we conducted performance metrics. The reported results are encouraging and promising.
Publisher
Association for Computing Machinery (ACM)
Reference38 articles.
1. Marlon A. Altamirano Di Luca and Neilys González Benítez. 2020. Comparative study of RDF and OWL ontology languages as support for the semantic web. In Proceedings of the Applied Technologies. Miguel Botto-Tobar, Marcelo Zambrano Vizuete, Pablo Torres-Carrión, Sergio Montes León, Guillermo Pizarro Vásquez, and Benjamin Durakovic (Eds.), Springer International Publishing, Cham, 3–12.
2. Wissam Antoun Fady Baly and Hazem Hajj. 2021. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv:2003.00104. Retrieved from https://arxiv.org/abs/2003.00104
3. Nature-Inspired Techniques for Dynamic Constraint Satisfaction Problems
4. Ontology Management
5. Ontology learning from text: An overview;Buitelaar Paul;Ontology Learning from Text: Methods, Evaluation, and Applications,2005