Affiliation:
1. Toyohashi University of Technology
Abstract
Language models (LMs) are an important field of study in automatic speech recognition (ASR) systems. LM helps acoustic models find the corresponding word sequence of a given speech signal. Without it, ASR systems would not understand the language and it would be hard to find the correct word sequence. During the past few years, researchers have tried to incorporate long-range dependencies into statistical word-based
n
-gram LMs. One of these long-range dependencies is topic. Unlike words, topic is unobservable. Thus, it is required to find the meanings behind the words to get into the topic. This research is based on the belief that nouns contain topic information. We propose a new approach for a topic-dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments were conducted on an English corpus and a Japanese corpus:
The Wall Street Journal
corpus and
Mainichi Shimbun
(Japanese newspaper) corpus. The results show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based
n
-gram LM, their interpolated LM, a cache-based LM, a topic-dependent LM based on
n
-gram, and a topic-dependent LM based on Latent Dirichlet Allocation (LDA). The
n
-best list rescoring was conducted to validate its application in ASR systems.
Publisher
Association for Computing Machinery (ACM)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献