Document‐to‐Document Retrieval Using Self‐Retrieval Learning and Automatic Keyword Extraction-Reference-Cited by-同舟云学术

Document‐to‐Document Retrieval Using Self‐Retrieval Learning and Automatic Keyword Extraction

Published:2024-08-20 Issue: Volume: Page:
ISSN:1931-4973
Container-title:IEEJ Transactions on Electrical and Electronic Engineering
language:en
Short-container-title:IEEJ Transactions Elec Engng

Author:

Seki Yasuaki¹,Hamagami Tomoki¹

Affiliation:

1. Graduate school of Engineering Yokohama National University 79–5 Tokiwadai, Hodogaya‐ku Yokohama 240‐8501 Japan

Abstract

In this study, we propose self‐retrieval learning, a self‐supervised learning method that does not require an annotated dataset. In self‐retrieval learning, keywords extracted from documents are used as queries to construct training data that imitate the relationship between query and corpus, such that the documents themselves are retrieved. In the usual supervised learning for information retrieval, a pair of query and corpus document is required as training data, but self‐retrieval learning does not require such data. In addition, it does not use information such as reference lists or other documents connected to the query, but only the text of the documents in the target domain. In our experiments, self‐retrieval learning was performed on the EU and UK legal document retrieval task using a retrieval model called DRMM. We found that self‐retrieval learning not only does not require supervised datasets, but also outperforms supervised learning with the same model in terms of retrieval accuracy. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/tee.24181

Reference19 articles.

1. A Deep Relevance Matching Model for Ad-hoc Retrieval

2. Understanding inverse document frequency: On theoretical arguments for idf;Robertson S;Journal of Documentation,2004

3. The Probabilistic Relevance Framework: BM25 and Beyond