Affiliation:
1. Google, Inc.
2. Oregon Health & Science University
Abstract
This paper explores lexicographic semirings and their application to problems in speech and language processing. Specifically, we present two instantiations of binary lexicographic semirings, one involving a pair of tropical weights, and the other a tropical weight paired with a novel string semiring we term the categorial semiring. The first of these is used to yield an exact encoding of backoff models with epsilon transitions. This lexicographic language model semiring allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. We present empirical results demonstrating that, even in simple intersection scenarios amenable to the use of failure transitions, the use of the more powerful lexicographic semiring is competitive in terms of time of intersection. The second of these lexicographic semirings is applied to the problem of extracting, from a lattice of word sequences tagged for part of speech, only the single best-scoring part of speech tagging for each word sequence. We do this by incorporating the tags as a categorial weight in the second component of a 〈Tropical, Categorial〉 lexicographic semiring, determinizing the resulting word lattice acceptor in that semiring, and then mapping the tags back as output labels of the word lattice transducer. We compare our approach to a competing method due to Povey et al. (2012).
Subject
Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Weighted programming: a programming paradigm for specifying mathematical models;Proceedings of the ACM on Programming Languages;2022-04-29
2. Incremental Lattice Determinization for WFST Decoders;2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2019-12