A Hierarchical Multi-Task Learning Framework for Semantic Annotation in Tabular Data

Author:

Wu Jie1ORCID,Hou Mengshu1ORCID

Affiliation:

1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Abstract

To optimize the utilization and analysis of tables, it is essential to recognize and understand their semantics comprehensively. This requirement is especially critical given that many tables lack explicit annotations, necessitating the identification of column types and inter-column relationships. Such identification can significantly augment data quality, streamline data integration, and support data analysis and mining. Current table annotation models often address each subtask independently, which may result in the neglect of constraints and contextual information, causing relational ambiguities and inference errors. To address this issue, we propose a unified multi-task learning framework capable of concurrently handling multiple tasks within a single model, including column named entity recognition, column type identification, and inter-column relationship detection. By integrating these tasks, the framework exploits their interrelations, facilitating the exchange of shallow features and the sharing of representations. Their cooperation enables each task to leverage insights from the others, thereby improving the performance of individual subtasks and enhancing the model’s overall generalization capabilities. Notably, our model is designed to employ only the internal information of tabular data, avoiding reliance on external context or knowledge graphs. This design ensures robust performance even with limited input information. Extensive experiments demonstrate the superior performance of our model across various tasks, validating the effectiveness of unified multi-task learning framework in the recognition and comprehension of table semantics.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Reference40 articles.

1. Chen, Z., Zhang, S., and Davison, B.D. (2021, January 11–15). WTR: A Test Collection for Web Table Retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada.

2. Zhong, V., Xiong, C., and Socher, R. (2017). Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arXiv.

3. Chen, W., Wang, H., Chen, J., Zhang, Y., Wang, H., Li, S., Zhou, X., and Wang, W.Y. (2020, January 26–30). TabFact: A Large-scale Dataset for Table-based Fact Verification. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia.

4. Chen, J., Jiménez-Ruiz, E., Horrocks, I., and Sutton, C. (2019, January 10–16). Learning semantic annotations for tabular data. Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China. Available online: https://dl.acm.org/doi/abs/10.5555/3367243.3367329.

5. Chen, J., Jiménez-Ruiz, E., Horrocks, I., and Sutton, C. (February, January 27). ColNet: Embedding the semantics of web tables for column type prediction. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3