Experience and prediction: a metric of hardness for a novel litmus test-Reference-Cited by-同舟云学术

Experience and prediction: a metric of hardness for a novel litmus test

Published:2021-02-08 Issue:8 Volume:31 Page:2028-2056
ISSN:0955-792X
Container-title:Journal of Logic and Computation
language:en
Short-container-title:

Author:

Isaak Nicos¹,Michael Loizos²

Affiliation:

1. Open University of Cyprus, Cyprus

2. Open University of Cyprus and CYENS Center of Excellence, Cyprus

Abstract

Abstract In the past decade, the Winograd schema challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this hardness metric could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on machine learning, able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.

Publisher

Oxford University Press (OUP)

Subject

Logic,Hardware and Architecture,Arts and Humanities (miscellaneous),Software,Theoretical Computer Science

Link

https://academic.oup.com/logcom/article-pdf/31/8/2028/41808982/exab005.pdf

Reference68 articles.

1. The Berkeley FrameNet Project;Baker,1998

2. Establishing a human baseline for the Winograd schema challenge;Bender,2015

3. A neural probabilistic language model;Bengio;Journal of Machine Learning Research,2003

4. Abductive commonsense reasoning;Bhagavatula,2019