Applications of Lexicographic Semirings to Problems in Speech and Language Processing-Reference-Cited by-同舟云学术

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Published:2014-12 Issue:4 Volume:40 Page:733-761
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Sproat Richard¹,Yarmohammadi Mahsa²,Shafran Izhak²,Roark Brian¹

Affiliation:

1. Google, Inc.

2. Oregon Health & Science University

Abstract

This paper explores lexicographic semirings and their application to problems in speech and language processing. Specifically, we present two instantiations of binary lexicographic semirings, one involving a pair of tropical weights, and the other a tropical weight paired with a novel string semiring we term the categorial semiring. The first of these is used to yield an exact encoding of backoff models with epsilon transitions. This lexicographic language model semiring allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. We present empirical results demonstrating that, even in simple intersection scenarios amenable to the use of failure transitions, the use of the more powerful lexicographic semiring is competitive in terms of time of intersection. The second of these lexicographic semirings is applied to the problem of extracting, from a lattice of word sequences tagged for part of speech, only the single best-scoring part of speech tagging for each word sequence. We do this by incorporating the tags as a categorial weight in the second component of a 〈Tropical, Categorial〉 lexicographic semiring, determinizing the resulting word lattice acceptor in that semiring, and then mapping the tags back as output labels of the word lattice transducer. We compare our approach to a competing method due to Povey et al. (2012).

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00198

Reference28 articles.

1. Partial parsing via finite-state cascades

2. Allauzen, Cyril, Mehryar Mohri, and Brian Roark. 2003. Generalized algorithms for constructing statistical language models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 40–47, Sapporo.

3. Allauzen, Cyril, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the Twelfth International Conference on Implementation and Application of Automata (CIAA 2007), Lecture Notes in Computer Science, volume 4793, pages 11–23, Prague.

4. A finite-state approach to machine translation

5. Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Weighted programming: a programming paradigm for specifying mathematical models;Proceedings of the ACM on Programming Languages;2022-04-29

2. Incremental Lattice Determinization for WFST Decoders;2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2019-12