A Hierarchical Multi-Task Learning Framework for Semantic Annotation in Tabular Data
Author:
Wu Jie1ORCID, Hou Mengshu1ORCID
Affiliation:
1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Abstract
To optimize the utilization and analysis of tables, it is essential to recognize and understand their semantics comprehensively. This requirement is especially critical given that many tables lack explicit annotations, necessitating the identification of column types and inter-column relationships. Such identification can significantly augment data quality, streamline data integration, and support data analysis and mining. Current table annotation models often address each subtask independently, which may result in the neglect of constraints and contextual information, causing relational ambiguities and inference errors. To address this issue, we propose a unified multi-task learning framework capable of concurrently handling multiple tasks within a single model, including column named entity recognition, column type identification, and inter-column relationship detection. By integrating these tasks, the framework exploits their interrelations, facilitating the exchange of shallow features and the sharing of representations. Their cooperation enables each task to leverage insights from the others, thereby improving the performance of individual subtasks and enhancing the model’s overall generalization capabilities. Notably, our model is designed to employ only the internal information of tabular data, avoiding reliance on external context or knowledge graphs. This design ensures robust performance even with limited input information. Extensive experiments demonstrate the superior performance of our model across various tasks, validating the effectiveness of unified multi-task learning framework in the recognition and comprehension of table semantics.
Funder
National Natural Science Foundation of China
Reference40 articles.
1. Chen, Z., Zhang, S., and Davison, B.D. (2021, January 11–15). WTR: A Test Collection for Web Table Retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada. 2. Zhong, V., Xiong, C., and Socher, R. (2017). Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arXiv. 3. Chen, W., Wang, H., Chen, J., Zhang, Y., Wang, H., Li, S., Zhou, X., and Wang, W.Y. (2020, January 26–30). TabFact: A Large-scale Dataset for Table-based Fact Verification. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia. 4. Chen, J., Jiménez-Ruiz, E., Horrocks, I., and Sutton, C. (2019, January 10–16). Learning semantic annotations for tabular data. Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China. Available online: https://dl.acm.org/doi/abs/10.5555/3367243.3367329. 5. Chen, J., Jiménez-Ruiz, E., Horrocks, I., and Sutton, C. (February, January 27). ColNet: Embedding the semantics of web tables for column type prediction. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.
|
|