PONE-Reference-Cited by-同舟云学术

PONE

Published:2021-01-05 Issue:1 Volume:39 Page:1-37
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Lan Tian¹,Mao Xian-Ling¹,Wei Wei²,Gao Xiaoyan¹,Huang Heyan¹

Affiliation:

1. Beijing Institute of Technology, Haidian, Beijing, China

2. Huazhong University of Science and Technology, Wuhan, China

Abstract

Open-domain generative dialogue systems have attracted considerable attention over the past few years. Currently, how to automatically evaluate them is still a big challenge. As far as we know, there are three kinds of automatic evaluations for open-domain generative dialogue systems: (1) Word-overlap-based metrics; (2) Embedding-based metrics; (3) Learning-based metrics. Due to the lack of systematic comparison, it is not clear which kind of metrics is more effective. In this article, we first measure systematically all kinds of metrics to check which kind is best. Extensive experiments demonstrate that learning-based metrics are the most effective evaluation metrics for open-domain generative dialogue systems. Moreover, we observe that nearly all learning-based metrics depend on the negative sampling mechanism, which obtains extremely imbalanced and low-quality samples to train a score model. To address this issue, we propose a novel learning-based metric that significantly improves the correlation with human judgments by using augmented PO sitive samples and valuable NE gative samples, called PONE. Extensive experiments demonstrate that PONE significantly outperforms the state-of-the-art learning-based evaluation method. Besides, we have publicly released the codes of our proposed metric and state-of-the-art baselines. 1

Funder

NSFC

Major Project of Zhijiang Lab

NSFB

National Key R&D Plan

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3423168

Reference52 articles.

1. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Twain-GCN: twain-syntax graph convolutional networks for aspect-based sentiment analysis;Knowledge and Information Systems;2024-05-30

2. Intrinsic Dependency Graph Convolutional Networks for Aspect Level Sentiment Analysis;2024 9th International Conference on Computer and Communication Systems (ICCCS);2024-04-19

3. Exploring Dense Retrieval for Dialogue Response Selection;ACM Transactions on Information Systems;2024-01-22

4. Towards Efficient Coarse-grained Dialogue Response Selection;ACM Transactions on Information Systems;2023-09-27

5. A novel adaptive marker segmentation graph convolutional network for aspect-level sentiment analysis;Knowledge-Based Systems;2023-06