A dataset of oracle characters for benchmarking machine learning algorithms-Reference-Cited by-同舟云学术

A dataset of oracle characters for benchmarking machine learning algorithms

Published:2024-01-18 Issue:1 Volume:11 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Wang Mei,Deng Weihong^ORCID

Abstract

AbstractOracle bone script is an ancient Chinese writing system engraved on turtle shells and animal bones, serving as a valuable resource for interpreting ancient culture, history, and language. We introduce the Oracle-MNIST dataset, comprising of 28 × 28 grayscale images of 30,222 ancient characters from 10 categories, designed for benchmarking pattern classification, with particular challenges related to image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST follows the same data format with the original MNIST dataset, enabling direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from (1) extremely serious and unique noises caused by three-thousand years of burial and aging and (2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research.

Funder

National Natural Science Foundation of China

China Postdoctoral Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-024-02933-w.pdf

Reference12 articles.

1. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998).

2. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, 1–9 (2012).

3. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).

4. Cohen, G., Afshar, S., Tapson, J. & Van Schaik, A. EMNIST: extending MNIST to handwritten letters. In Proceedings of the international joint conference on neural networks, 2921–2926 (2017).

5. Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Preprint https://arxiv.org/abs/1708.07747 (2017).