A new dataset for mongolian online handwritten recognition

Author:

Pan Yuecai,Fan Daoerji,Wu Huijuan,Teng Da

Abstract

AbstractThis paper introduces a new traditional Mongolian word-level online handwriting dataset, MOLHW. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. The coordinate points of words were collected by volunteers, who wrote the corresponding words on the dedicated application for their mobile phones. Latin transliteration of Mongolian was used to annotate the coordinates of each word. At the same time, the writer’s identification number and mobile phone screen information were recorded in the dataset. Using this dataset, we propose an encoder–decoder Mongolian online handwriting recognition model with a deep bidirectional gated recurrent unit and attention mechanism as the baseline evaluation model. Under this model, the optimal performance of the word error rate (WER) on the test set was 24.281%. Furthermore, we present the experimental results of different Mongolian online handwriting recognition models. The experimental results show that compared with other models, the model based on Transformer could learn the corresponding character sequences from the coordinate data of the dataset more effectively, with a 16.969% WER on the test set. The dataset is now freely available to researchers worldwide. The dataset can be applied to handwritten text recognition as well as handwritten text generation, handwriting identification, and signature recognition.

Funder

National Natural Science Foundation of the Inner Mongolia Autonomous Region

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Handwritten Recognition Techniques: A Comprehensive Review;Symmetry;2024-06-02

2. Offline Mongolian Handwriting Identification Based on Convolutional Neural Network;Electronics;2023-12-27

3. The Application of Unicode in Cultural Heritage Databases;2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT);2023-11-10

4. Online Mongolian Handwriting Recognition Based on Encoder–Decoder Structure with Language Model;Electronics;2023-10-10

5. A Comprehensive and Comparative Study of Handwriting Recognition System;2023 IEEE Renewable Energy and Sustainable E-Mobility Conference (RESEM);2023-05-17

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3