Column-Type Prediction for Web Tables Powered by Knowledge Base and Text
-
Published:2023-01-20
Issue:3
Volume:11
Page:560
-
ISSN:2227-7390
-
Container-title:Mathematics
-
language:en
-
Short-container-title:Mathematics
Author:
Wu Junyi1, Ye Chen123ORCID, Zhi Haoshi1, Jiang Shihao1
Affiliation:
1. College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China 2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China 3. Jubang Group Co., Ltd., Yueqing 325600, China
Abstract
Web tables are essential for applications such as data analysis. However, web tables are often incomplete and short of some critical information, which makes it challenging to understand the web table content. Automatically predicting column types for tables without metadata is significant for dealing with various tables from the Internet. This paper proposes a CNN-Text method to deal with this task, which fuses CNN prediction and voting processes. We present data augmentation and synthetic column generation approaches to improve the CNN’s performance and use extracted text to get better predictions. The experimental result shows that CNN-Text outperforms the baseline methods, demonstrating that CNN-Text is well qualified for the table column type prediction.
Funder
National Natural Science Foundation of China National Key Research and Development Program of China Natural Science Foundation of Zhejiang Province
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference27 articles.
1. Haneem, F., Ali, R., Kama, N., and Basri, S. (2017, January 16–17). Descriptive analysis and text analysis in systematic literature review: A review of master data management. Proceedings of the 2017 International Conference on Research and Innovation in Information Systems (ICRIIS), Langkawi, Malaysia. 2. White, R.W., Dumais, S.T., and Teevan, J. (2009, January 9–11). Characterizing the influence of domain expertise on web search behavior. Proceedings of the Second ACM International Conference on Web Search and Data Mining, Barcelona, Spain. 3. Fan, J., Lu, M., Ooi, B.C., Tan, W.C., and Zhang, M. (April, January 31). A hybrid machine-crowdsourcing system for matching web tables. Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, USA. 4. Tanon, T.P., Weikum, G., and Suchanek, F.M. (June, January 31). YAGO 4: A Reason-able Knowledge Base. Proceedings of the ESWC, Crete, Greece. 5. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007, January 11–15). DBpedia: A Nucleus for a Web of Open Data. Proceedings of the ISWC, Busan, Korea.
|
|