Comparison of methods for language-dependent and language-independent query-by-example spoken term detection-Reference-Cited by-同舟云学术

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

Published:2012-08 Issue:3 Volume:30 Page:1-34
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Tejedor Javier¹,Fapšo Michal²,Szöke Igor²,Černocký Jan “Honza”²,Grézl František²

Affiliation:

1. Universidad Aut´onoma de Madrid, Spain

2. Brno University of Technology, Czech Republic

Abstract

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors, the third one, based on weighted finite-state transducers (WFST), processes phone lattices. QbE STD is compared to two standard STD systems with text queries: acoustic keyword spotting and WFST-based search of phone strings in phone lattices. The results are reported on four languages (Czech, English, Hungarian, and Levantine Arabic) using standard metrics: equal error rate (EER) and two versions of popular figure-of-merit (FOM). Language-dependent and language-independent cases are investigated; the latter being particularly interesting for scenarios lacking standard resources to train speech recognition systems. While the DTW and GMM/HMM approaches produce the best results for a language-dependent setup depending on the target language, the GMM/HMM approach performs the best dealing with a language-independent setup. As far as WFSTs are concerned, they are promising as they allow for indexing and fast search.

Funder

Czech Ministry of Education

Czech Science Foundation

IT4 Innovations Centre of Excellence

Technologická Agentura Ceské Republiky

Czech Ministry of Trade and Commerce

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/2328967.2328971

Reference61 articles.

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multilingual Short Text Analysis of Twitter Using Random Forest Approach;Knowledge Graphs and Semantic Web;2021

2. Fast Query-by-example Speech Search using Attention-based Deep Binary Embeddings;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2020

3. Query expansion techniques for information retrieval: A survey;Information Processing & Management;2019-09

4. Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context;IEEE Access;2019

5. Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval;Journal of Intelligent Information Systems;2018-02-19