SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news
-
Published:2021-10-08
Issue:1
Volume:56
Page:225-257
-
ISSN:1574-020X
-
Container-title:Language Resources and Evaluation
-
language:en
-
Short-container-title:Lang Resources & Evaluation
Author:
Jacobs GillesORCID, Hoste VéroniqueORCID
Abstract
AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$
F
1
-score of $$59\%$$
59
%
validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.
Funder
Fonds Wetenschappelijk Onderzoek
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics
Reference105 articles.
1. Aguilar, J., Beller, C., McNamee, P., Van Durme, B., Strassel, S., Song, Z., & Ellis, J. (2014). A comparison of the events and relations across ace, ere, tac-kbp, and framenet annotation standards. In: Proceedings of the Second Workshop on Events: Definition, Detection, Coreference, and Representation, pp. 45–53. 2. Arendarenko, E., & Kakkonen, T. (2012). Ontology-based information and event extraction for business intelligence. In: Artificial Intelligence: Methodology, Systems, and Applications, Springer, Lecture Notes in Computer Science, vol. 7557, pp. 89–102. 3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In K. Aberer, K. S. Choi, N. Noy, D. Allemang, K. I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, & P. Cudré-Mauroux (Eds.), The semantic web (pp. 722–735). Berlin, Heidelberg: Springer. 4. Ben Ami, Z., & Feldman, R. (2017). Event-based trading: Building superior trading strategies with state-of-the-art information extraction tools. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2907600, sSRN Working Paper 2907600. 5. Bholat, D., Hansen, S., Santos, P., & Schonhardt-Bailey, C. (2015). Text mining for central banks. Available at SSRN 2624811.
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|