Mining Domain Terminologies Using Search Engine's Query Log-Reference-Cited by-同舟云学术

Mining Domain Terminologies Using Search Engine's Query Log

Published:2021-11-30 Issue:6 Volume:20 Page:1-32
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ni Weijian¹,Liu Tong¹,Zeng Qingtian¹,Xie Nengfu²

Affiliation:

1. Shandong University of Science and Technology, Qingdao, China

2. Chinese Academy of Agricultural Sciences, Beijing, China

Abstract

Domain terminologies are a basic resource for various natural language processing tasks. To automatically discover terminologies for a domain of interest, most traditional approaches mostly rely on a domain-specific corpus given in advance; thus, the performance of traditional approaches can only be guaranteed when collecting a high-quality domain-specific corpus, which requires extensive human involvement and domain expertise. In this article, we propose a novel approach that is capable of automatically mining domain terminologies using search engine's query log—a type of domain-independent corpus of higher availability, coverage, and timeliness than a manually collected domain-specific corpus. In particular, we represent query log as a heterogeneous network and formulate the task of mining domain terminology as transductive learning on the heterogeneous network. In the proposed approach, the manifold structure of domain-specificity inherent in query log is captured by using a novel network embedding algorithm and further exploited to reduce the need for the manual annotation efforts for domain terminology classification. We select Agriculture and Healthcare as the target domains and experiment using a real query log from a commercial search engine. Experimental results show that the proposed approach outperforms several state-of-the-art approaches.

Funder

National Natural Science Foundation of China

Taishan Scholars Program of Shandong Province

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3462327

Reference46 articles.

1. ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala;Astrakhantsev Nikita;Lang. Resour. Eval.,2018