Information Extraction Tasks based on BERT and SpaCy on Tourism Domain-Reference-Cited by-同舟云学术

Information Extraction Tasks based on BERT and SpaCy on Tourism Domain

Published:2021-01-05 Issue:1 Volume:15 Page:108-122
ISSN:2286-9131
Container-title:ECTI Transactions on Computer and Information Technology (ECTI-CIT)
language:
Short-container-title:ECTI-CIT

Author:

Chantrapornchai Chantana,Tunsakul Aphisit

Abstract

In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classiﬁcation and text summarization. The ﬁrst step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.

Publisher

ECTI

Subject

Electrical and Electronic Engineering,Information Systems and Management,Computer Networks and Communications,Information Systems

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning applied to tourism: A systematic review;WIREs Data Mining and Knowledge Discovery;2024-07-04

2. An automated information extraction system from the knowledge graph based annual financial reports;PeerJ Computer Science;2024-05-13

3. Natural Language Processing and Fiction Text: Basis for Corpus Research;RUDN Journal of Language Studies, Semiotics and Semantics;2024-03-15

4. Job Recommendation System based on Resume using Natural Language Processing and Distance-based Algorithm;2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS);2024-02-21

5. Detecting Function Inputs and Outputs for Learning-Problem Generation in Intelligent Tutoring Systems;Lecture Notes in Computer Science;2024