Towards Robust QA Evaluation via Open LLMs-Reference-Cited by-同舟云学术

Towards Robust QA Evaluation via Open LLMs

Published:2024-07-10 Issue: Volume:202 Page:2811-2816
ISSN:
Container-title:Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
language:
Short-container-title:

Author:

Kamalloo Ehsan¹^ORCID,Upadhyay Shivani¹^ORCID,Lin Jimmy¹^ORCID

Affiliation:

1. University of Waterloo, Waterloo, Canada

Funder

Natural Sciences and Engineering Research Council of Canada

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3626772.3657675

Reference44 articles.

1. Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2023. Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering. (2023). arxiv: 2307.16877

2. Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks

3. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72.

4. Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, Usvsn Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar Van Der Wal. 2023. Pythia: A Suite for Analyzing Large Language Models across Training and Scaling. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202. PMLR, 2397--2430.

5. GPT-NeoX-20B: An Open-Source Autoregressive Language Model