A Scalable Distributed Syntactic, Semantic, and Lexical Language Model-Reference-Cited by-同舟云学术

A Scalable Distributed Syntactic, Semantic, and Lexical Language Model

Published:2012-09 Issue:3 Volume:38 Page:631-671
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Tan Ming¹,Zhou Wenli¹,Zheng Lei¹,Wang Shaojun¹

Affiliation:

1. Wright State University

Abstract

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00107

Reference76 articles.

1. Trainable grammars for speech recognition

2. Mitigating the paucity-of-data problem

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Deception Model Robust to Eavesdropping Over Communication for Social Network Systems;IEEE Access;2019

2. Sparse Non-negative Matrix Language Modeling;Transactions of the Association for Computational Linguistics;2016-12