Abstract
Hashing cross-modal retrieval methods aim to retrieve different modalities and learn common semantics with low storage and time cost. Although many excellent hashing methods have been proposed in the past decades, there are still some issues. For example, most methods focus on the Euclidean domain, ignoring the graph-structure information contained in data points, so outliers and noise in the Euclidean domain will cause a drop in accuracy. Some methods only learn a latent subspace, which may be unreasonable because the dimensionality of the modalities is not the same as the distribution. To address these issues, we propose a hashing technique called Two Stage Graph Hashing (TSGH). In the first stage, we first learn a specific latent subspace for each modality using Collective Matrix Decomposition and the proposed Graph Convolutional Network (GCN). Therefore, the learned subspace contains the features of Euclidean and non-Euclidean domains, which can eliminate the influence of noise and outliers in the dataset. And then, Global Approximation is used to align the subspaces of the different modalities, so that high-level shared semantics can be explored. Finally, discrete hash codes are learned from the latent subspace and their semantic similarity. In the second stage, we design a linear classifier as the hash function and propose Local Similarity Preservation to capture the local relationship between hash codes and Hamming spaces. To verify the effectiveness of TSGH, we conduct extensive experiments on three public datasets. We achieve the best results compared to previous SOTA methods, illustrating the superiority of TSGH.