Abstract
AbstractIn this paper, we establish the first baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition. A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by transforming the target sequences into representations which approximate diplomatic transcriptions, wherein each symbol in the script is represented by its own character in the transliteration, as opposed to corresponding combinations of characters from the Swedish alphabet. Four such encoding schemes are evaluated and results are further improved by integrating a pre-training scheme, based on synthetic data. The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are reduced significantly (p< 0.01) by combining stenography-specific target sequence encodings with pre-training and fine-tuning, yielding CERs in the range of 24.5–26% and WERs of 44.8–48.2%. An analysis of selected recognition errors illustrates the challenges that the stenographic writing system poses to text recognition. This work establishes the first baseline for handwritten stenography recognition. Our proposed combination of integrating stenography-specific knowledge, in conjunction with pre-training and fine-tuning on synthetic data, yields considerable improvements. Together with our precursor study on the subject, this is the first work to apply modern handwritten text recognition to stenography. The dataset and our code are publicly available via Zenodo.
Publisher
Springer Science and Business Media LLC
Reference52 articles.
1. Nauwerck, M.: Storyteller, stenographer, and self-published superstar: how Astrid Lindgren’s multiple roles in book production created the Lindgren myth. Mém. Livre Stud. Book Cult. 13(1), 1–37 (2022). https://doi.org/10.7202/1094130ar
2. Bohlund, K.: Den Okända Astrid Lindgren: Åren Som Bokförläggare och Chef. Astrid Lindgren Text, Stockholm (2018)
3. Andersen, J., Andersson, U.: Denna Dagen, Ett Liv: en Biografi Över Astrid Lindgren. Norstedt, Stockholm (2014)
4. Törnqvist, L.: Man Tar Vanliga Ord: Att Läsa Om Astrid Lindgren. Salikon förl, Stockholm, Sweden (2015)
5. The Swedish Institute for Children’s Books: About the Astrid Lindgren code (2022). https://www.barnboksinstitutet.se/en/forskning/astrid-lindgren-koden/. Accessed 22 Feb 2022