Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation
-
Published:2023-09-23
Issue:
Volume:
Page:1-20
-
ISSN:2196-8888
-
Container-title:Vietnam Journal of Computer Science
-
language:en
-
Short-container-title:Vietnam J. Comp. Sci.
Author:
Minh Nhan Vo12ORCID,
Minh Khue Nguyen Tran12ORCID,
Nguyen Long H. B.12ORCID,
Dinh Dien12ORCID
Affiliation:
1. Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam
2. Vietnam National University, Ho Chi Minh City, Vietnam
Abstract
Domain adaptation in neural machine translation (NMT) tasks often involves working with datasets that have a different distribution from the training data. In such scenarios, k-nearest-neighbor machine translation (kNN-MT) has been shown to be effective in retrieving relevant information from large datastores. However, the high-dimensional context vectors of large neural machine translation model result in high computational costs for distance computation and storage. To address this issue, index optimization techniques have been proposed, including the use of inverted file index (IVF) and product vector quantization (PQ), called IVFPQ. In this paper, we explore the recent index techniques for efficient machine translation domain adaptation and combine multiple index structures to improve the efficiency of nearest-neighbor search in domain adaptation datasets for machine translation task. Specifically, we evaluate the effectiveness when combining optimized product quantization (OPQ) and hierarchical navigable small-world (HNSW) indexing with IVFPQ. Our study aims to provide insights into the most suitable composite index methods for efficient nearest-neighbor search in domain adaptation datasets, with a focus on improving both accuracy and speed.
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computational Theory and Mathematics,Computer Vision and Pattern Recognition,Information Systems,Computer Science (miscellaneous),Software