Author:
Chantrapornchai Chantana,Tunsakul Aphisit
Abstract
In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classification and text summarization. The first step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.
Subject
Electrical and Electronic Engineering,Information Systems and Management,Computer Networks and Communications,Information Systems
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献