Statistical lattice-based spoken document retrieval-Reference-Cited by-同舟云学术

Statistical lattice-based spoken document retrieval

Published:2010-01 Issue:1 Volume:28 Page:1-30
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Chia Tee Kiah¹,Sim Khe Chai²,Li Haizhou²,Ng Hwee Tou¹

Affiliation:

1. National University of Singapore, Singapore

2. Institute for Infocomm Research, Singapore

Abstract

Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n -gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1658377.1658379

Reference50 articles.

1. Generalized algorithms for constructing statistical language models

2. A General Weighted Grammar Library

3. Information retrieval as statistical translation

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lexicon-based probabilistic indexing of handwritten text images;Neural Computing and Applications;2023-05-10

2. Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting;Signal and Data Processing;2021-02-01

3. Interactive Spoken Content Retrieval by Deep Reinforcement Learning;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2018-12

4. Web-Based Behavioral Modeling for Continuous User Authentication (CUA);Advances in Computers;2017

5. HMM word graph based keyword spotting in handwritten document images;Information Sciences;2016-11