The Taxonomy of Writing Systems: How to Measure How Logographic a System Is-Reference-Cited by-同舟云学术

The Taxonomy of Writing Systems: How to Measure How Logographic a System Is

Published:2021-11 Issue:3 Volume:47 Page:477-528
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:

Author:

Sproat Richard¹,Gutkin Alexander²

Affiliation:

1. Search Google, Japan. rws@google.com

2. Research & Machine Intelligence, Google, UK. agutkin@google.com

Abstract

Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with tradition by splitting the logographic and phonographic aspects into two dimensions, with logography being graded rather than a categorical distinction. A system could be syllabic, and highly logographic; or alphabetic, and mostly non-logographic. This accords better with how writing systems actually work, but neither author proposed a method for measuring logography. In this article we propose a novel measure of the degree of logography that uses an attention-based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context. In an ideal phonographic system, the model should need to attend to only the current token in order to compute how to spell it, and this would show in the attention matrix activations. In contrast, with a logographic system, where a given pronunciation might correspond to several different spellings, the model would need to attend to a broader context. The ratio of the activation outside the token and the total activation forms the basis of our measure. We compare this with a simple lexical measure, and an entropic measure, as well as several other neural models, and argue that on balance our attention-based measure accords best with intuition about how logographic various systems are. Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://direct.mit.edu/coli/article-pdf/47/3/477/1971897/coli_a_00409.pdf

Reference136 articles.

1. TensorFlow: Large-scale machine learning on heterogeneous distributed systems;Abadi;CoRR,2016

2. Learning to read Finnish;Aro,2017

3. Machine translation by jointly learning to align and translate;Bahdanau,2015

4. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling;Bai;arXiv preprint arXiv:1803.01271,2018

5. Losing heads in the lottery: Pruning transformer attention in neural machine translation;Behnke,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts;Computational Linguistics;2024

2. Editorial: Reading acquisition of Chinese as a second/foreign language;Frontiers in Psychology;2023-06-23

3. Improving Phonetic Realizations in TTS by Using Phoneme-Aligned Graphemes;ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2022-05-23