Unsupervised Numerical Information Extraction via Exploiting Syntactic Structures
-
Published:2023-04-24
Issue:9
Volume:12
Page:1977
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Wang Zixiang1ORCID, Li Tongliang1ORCID, Li Zhoujun1
Affiliation:
1. State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China
Abstract
Numerical information plays an important role in various fields such as scientific, financial, social, statistics, and news. Most prior studies adopt unsupervised methods by designing complex handcrafted pattern-matching rules to extract numerical information, which can be difficult to scale to the open domain. Other supervised methods require extra time, cost, and knowledge to design, understand, and annotate the training data. To address these limitations, we propose QuantityIE, a novel approach to extracting numerical information as structured representations by exploiting syntactic features of both constituency parsing (CP) and dependency parsing (DP). The extraction results may also serve as distant supervision for zero-shot model training. Our approach outperforms existing methods from two perspectives: (1) the rules are simple yet effective, and (2) the results are more self-contained. We further propose a numerical information retrieval approach based on QuantityIE to answer analytical queries. Experimental results on information extraction and retrieval demonstrate the effectiveness of QuantityIE in extracting numerical information with high fidelity.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference40 articles.
1. Sugawara, S., Inui, K., Sekine, S., and Aizawa, A. (November, January 31). What Makes Reading Comprehension Questions Easier?. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. 2. Xu, Y., Liu, X., Shen, Y., Liu, J., and Gao, J. (2019, January 2–7). Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA. 3. Zhang, X., Huang, H., Chi, Z., and Mao, X.L. (2022, January 12–17). ET5: A Novel End-to-end Framework for Conversational Machine Reading Comprehension. Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea. 4. Song, M., Feng, Y., and Jing, L. (2022, January 10–15). Hyperbolic Relevance Matching for Neural Keyphrase Extraction. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA. 5. Cao, Y., Groves, W., Saha, T.K., Tetreault, J., Jaimes, A., Peng, H., and Yu, P. (2022, January 10–15). XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction. Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|