Abstract
Abstract. This research introduces a novel approach to improve vision-based positioning in the absence of GNSS signals. Specifically, we address the challenge posed by obstacles that alter image information or features, making retrieving the query image from the database difficult. While the Bag of Visual Words (BoVW) is a widely used image retrieval technique, it has a limitation in representing each image with a single histogram vector or vocabulary of visual words, i.e., the emergence of obstacles can introduce new features to the query image, resulting in different visual words. Our study overcomes this limitation by clustering the features of each image using the k-means method and generating a graph for each class. Each node or key point in the graph obtains additional information from its direct neighbors using functions employed in graph neural networks, functioning as a feedforward network with constant parameters. This process generates new embedding nodes, and eventually, global pooling is applied to produce one vector for each graph, representing each image with graph vectors based on objects or feature classes. As a result, each image is represented with graph vectors based on objects or feature classes. In the presence of obstacles covering one or more graphs, there is sufficient information from the query image to retrieve the most relevant image from the database. Our approach was applied to indoor positioning applications, with the database collected in Bolz Hall at The Ohio State University. Traditional BoVW techniques struggle to properly retrieve most query images from the database due to obstacles like humans or recently deployed objects that alter image features. In contrast, our approach has shown progress in image retrieval by representing each image with multiple graph vectors, depending on the number of objects in the image. This helps prevent or mitigate changes in image features caused by obstacles covering or adding features to the image, as demonstrated in the results.