Three-stage Transferable and Generative Crowdsourced Comment Integration Framework Based on Zero- and Few-shot Learning with Domain Distribution Alignment-Reference-Cited by-同舟云学术

Three-stage Transferable and Generative Crowdsourced Comment Integration Framework Based on Zero- and Few-shot Learning with Domain Distribution Alignment

Published:2024-01-12 Issue:3 Volume:18 Page:1-43
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Rong Huan¹^ORCID,Yu Xin²^ORCID,Ma Tinghuai³^ORCID,Sheng Victor S.⁴^ORCID,Zhou Yang⁵^ORCID,Mznah Al-Rodhaan⁶^ORCID

Affiliation:

1. School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, China

2. School of Software, Nanjing University of Information Science & Technology, China

3. School of Software, Nanjing University of Information Science & Technology, China and School of Computer Engineering, Jiangsu Ocean University, China

4. Texas Tech University, USA

5. Auburn University, USA

6. King Saud University, Kingdom of Saudi Arabia

Abstract

Online shopping has become a crucial way to encourage daily consumption, where the User-generated, or crowdsourced product comments, can offer a broad range of feedback on e-commerce products. As a result, integrating critical opinions or major attitudes from the crowdsourced comments can provide valuable feedback for marketing strategy adjustment or product-quality monitoring. Unfortunately, the scarcity of annotated ground truth on the integrated comment, or the limited gold integration reference, has incurred the infeasibility of the regular supervised-learning-based comment integration. To resolve this problem, in this article, inspired by the principle of Transfer Learning, we propose a three-stage transferable and generative crowdsourced comment integration framework ( TTGCIF ) based on zero-and-few-shot learning with the support of domain distribution alignment. The proposed framework aims at generating abstractive integrated comment in target domain via the enhanced neural text generation model, by referring the available integration resource in related source domains, to avoid the exhausted effort on resource annotation devoted to the target domain. Specifically, at the first stage, to enhance the domain transferability, representations on the crowdsourced comments have been aligned up between the source and target domain, by minimizing the domain distribution discrepancy in the kernel space. At the second stage, Zero-shot comment integration mechanism has been adopted to deal with the dilemma that none of the gold integration reference may be available in target domain. In other words, taking the sample-level semantic prototype as input, the enhanced neural text generation model in TTGCIF is trained to learn data semantic association among different domains via semantic prototype transduction, so that the “ unlabeled ” crowdsourced comments in target domain can be associated with existing integration references in related source domains. At the third stage, based on the parameters trained at the second stage, fast domain adaptation mechanism in a Few-shot manner has also been adopted by seeking most potential parameters along the gradient direction constrained by instances across multiple source domains. In this way, parameters in TTGCIF can be sensitive to any alteration on training data, ensuring that even if only few annotated resource in target domain are available for “Fine-tune,” TTGCIF can still react promptly to achieve effective target domain adaptation. According to the experimental results, TTGCIF can achieve the best transferable product comment integration performance in target domain, with fast and stable domain adaption effect depending on no more than 10% annotated resource in target domain. More importantly, even if TTGCIF has not been fine-tuned on the target domain, yet by referring to the available integration resource in related source domains, the integrated comments generated by TTGCIF on the target domain are still superior to those generated by models already fine-tuned on the target domain.

Funder

National Science Foundation of China

Jiangsu Natural Science Foundation

National Key Research and Development Program

Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University, Saudi Arabia

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3636511

Reference51 articles.

1. Antreas Antoniou, Harrison Edwards, and Amos Storkey. 2019. How to train your MAML. In Proceedings of the 7th International Conference on Learning Representations. 1–11.

2. Yue Cao, Hui Liu, and Xiaojun Wan. 2020. Jointly learning to align and summarize for neural cross-lingual summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6220– 6231.

3. An empirical comparison of supervised learning algorithms

4. Subspace Distribution Adaptation Frameworks for Domain Adaptation

5. SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization