A Corpus-Based Sentence Classifier for Entity–Relationship Modelling-Reference-Cited by-同舟云学术

A Corpus-Based Sentence Classifier for Entity–Relationship Modelling

Published:2022-03-11 Issue:6 Volume:11 Page:889
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Šuman Sabrina^ORCID,Čandrlić Sanja,Jakupović Alen^ORCID

Abstract

Automated creation of a conceptual data model based on user requirements expressed in the textual form of a natural language is a challenging research area. The complexity of natural language requires deep insight into the semantics buried in words, expressions, and string patterns. For the purpose of natural language processing, we created a corpus of business descriptions and an adherent lexicon containing all the words in the corpus. Thus, it was possible to define rules for the automatic translation of business descriptions into the entity–relationship (ER) data model. However, since the translation rules could not always lead to accurate translations, we created an additional classification process layer—a classifier which assigns to each input sentence some of the defined ER method classes. The classifier represents a formalized knowledge of the four data modelling experts. This rule-based classification process is based on the extraction of ER information from a given sentence. After the detailed description, the classification process itself was evaluated and tested using the standard multiclass performance measures: recall, precision and accuracy. The accuracy in the learning phase was 96.77% and in the testing phase 95.79%.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/6/889/pdf

Reference48 articles.

1. Knowledge-Based Systems for Data Modelling

2. Knowledge-Based Systems for Data Modelling: Review and Challenges;Šuman,2017

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Corpus Statistics Empowered Document Classification;Electronics;2022-07-11