Video Turing Test: A first step towards human‐level AI-Reference-Cited by-同舟云学术

Video Turing Test: A first step towards human‐level AI

Published:2023-10-17 Issue:4 Volume:44 Page:537-554
ISSN:0738-4602
Container-title:AI Magazine
language:en
Short-container-title:AI Magazine

Author:

Lee Minsu¹²^ORCID,Heo Yu‐Jung³^ORCID,Choi Seongho²^ORCID,Choi Woo Suk²^ORCID,Zhang Byoung‐Tak¹²^ORCID

Affiliation:

1. Artificial Intelligence Institute Seoul National University Seoul Republic of Korea

2. Seoul National University Seoul Republic of Korea

3. KT Seoul Republic of Korea

Abstract

AbstractThe development of artificial intelligence (AI) agents capable of human‐level understanding of video content and conducting conversations with humans on this basis is a promising application that people expect. However, this is a challenging task that requires the holistic integration of multimodal information with temporal dependencies and reasoning, as well as social and physical commonsense. In addition, the development of appropriate systematic evaluation methods is essential. In this context, we introduce the Video Turing Test (VTT), a blind test used to evaluate human‐likeness in terms of video comprehension ability. Moreover, we propose Vincent as a video understanding AI. We explain the configuration of VTT, the architecture of Vincent to prepare for VTT and the proposed evaluation methods for video comprehension. We also estimate the current intelligence level of AI based on our results and discuss future research directions.

Funder

National Research Foundation of Korea

Publisher

Wiley

Subject

Artificial Intelligence

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/aaai.12128

Reference57 articles.

1. I-athlon: Towards A Multidimensional Turing Test

2. Adiwardana D. M.‐T.Luong D. R.So J.Hall N.Fiedel R.Thoppilan Z.Yang et al.2020. “Towards a Human‐like Open‐domain Chatbot.” CoRR abs/2001.09977.

3. Alamri H. V.Cartillier A.Das J.Wang A.Cherian I.Essa D.Batra et al.2019. “Audio Visual Scene‐aware Dialog.” InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 7558–7567.

4. Banerjee S. andA.Lavie.2005. “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” InProceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 65–72.

5. Bebensee B. andB.‐T.Zhang2021. “Co‐Attentional Transformers for Story‐based Video Understanding.” InIEEE International Conference on Acoustics Speech and Signal Processing ICASSP 2021 Toronto ON Canada June 6–11 2021 4005–4009.IEEE.