Author:
Purnomo W.P. Yohanes Sigit,Kumar Yogan Jaya,Zulkarnain Nur Zareen
Abstract
Purpose
Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.
Design/methodology/approach
This paper presents a literature review from selected previous research that specifically addresses the topics of quotation extraction and quotation attribution. Research works that discuss corpus development related to quotation extraction and quotation attribution are also considered. The findings of the review will be used as a basis for proposing a framework to direct further research.
Findings
There are three findings in this study. Firstly, the extraction process still consists of two main tasks, namely, the extraction of quotations and the attribution of quotations. Secondly, most extraction algorithms rely on a rule-based algorithm or traditional machine learning. And last, the availability of corpus, which is limited in quantity and depth. Based on these findings, a statement extraction framework for Indonesian language corpus and model development is proposed.
Originality/value
The paper serves as a guideline to formulate a framework for statement extraction based on the findings from the literature study. The proposed framework includes a corpus development in the Indonesian language and a model for public figure statement extraction. Furthermore, this study could be used as a reference to produce a similar framework for other languages.
Subject
Library and Information Sciences
Reference58 articles.
1. An analytical study of information extraction from unstructured and multidimensional big data;Journal of Big Data,2019
2. Akhundov, A. Trautmann, D. and Groh, G. (2018), “Sequence labeling: a practical approach”, arXiv abs/1808.03926, available at: http://arxiv.org/abs/1808.03926.
3. A joint model for quotation attribution and coreference resolution,2014
4. Annotating attribution relations in Arabic,2018
5. Quotes as data extracting political statements from Dutch newspapers by applying transformation rules to Syntax graphs,2013
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献