Bugs in the Data: How ImageNet Misrepresents Biodiversity-Reference-Cited by-同舟云学术

Bugs in the Data: How ImageNet Misrepresents Biodiversity

Published:2023-06-26 Issue:12 Volume:37 Page:14382-14390
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Luccioni Alexandra Sasha,Rolnick David

Abstract

ImageNet-1k is a dataset often used for benchmarking machine learning (ML) models and evaluating tasks such as image recognition and object detection. Wild animals make up 27% of ImageNet-1k but, unlike classes representing people and objects, these data have not been closely scrutinized. In the current paper, we analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set, with the participation of expert ecologists. We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect. We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases, as well as ambiguities such as artificial animals, multiple species in the same image, or the presence of humans. Our findings highlight serious issues with the extensive use of this dataset for evaluating ML systems, the use of such algorithms in wildlife-related tasks, and more broadly the ways in which ML datasets are commonly created and curated.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Educational Technology and Responsible Automated Essay Scoring in the Generative AI Era;Practice, Progress, and Proficiency in Sustainability;2024-06-28

2. Stress concentration identification based on YOLOv8n algorithm;International Conference on Algorithms, Software Engineering, and Network Security;2024-04-26

3. Understanding the Process of Data Labeling in Cybersecurity;Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing;2024-04-08

4. Integrating YOLOv8-agri and DeepSORT for Advanced Motion Detection in Agriculture and Fisheries;EAI Endorsed Transactions on Industrial Networks and Intelligent Systems;2024-02-12

5. Application of the Few-Shot Algorithm for the Estimation of Bird Population Size in Chihuahua and Its Ornithological Implications;Lecture Notes in Computer Science;2024