SemRe-Rank-Reference-Cited by-同舟云学术

SemRe-Rank

Published:2018-10-31 Issue:5 Volume:12 Page:1-41
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Zhang Ziqi¹^ORCID,Gao Jie²,Ciravegna Fabio²

Affiliation:

1. University of Sheffield, Nottingham Trent University, Sheffield, UK

2. University of Sheffield, Sheffield, UK

Abstract

Automatic Term Extraction (ATE) deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed perspective to this problem: instead of searching for such a ‘one-size-fit-all’ solution that may never exist, we propose to develop generic methods to ‘enhance’ existing ATE methods. We introduce SemRe-Rank, the first method based on this principle, to incorporate semantic relatedness—an often overlooked venue—into an existing ATE method to further improve its performance. SemRe-Rank incorporates word embeddings into a personalised PageRank process to compute ‘semantic importance’ scores for candidate terms from a graph of semantically related words (nodes), which are then used to revise the scores of candidate terms computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all base methods and across all datasets, with up to 15 percentage points when measured by the Precision in the top ranked K candidate terms (the average for a set of K ’s), or up to 28 percentage points in F1 measured at a K that equals to the expected real terms in the candidates (F1 in short). Compared to an alternative approach built on the well-known TextRank algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision at top K , or up to 17 points in F1.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3201408

Reference97 articles.

1. Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?;Machine Learning;2024-03-27

2. Computational Terminology;New Frontiers in Translation Studies;2024

3. CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature;Lecture Notes in Computer Science;2024

4. Is Prompting What Term Extraction Needs?;Lecture Notes in Computer Science;2024

5. WERECE: An Unsupervised Method for Educational Concept Extraction Based on Word Embedding Refinement;Applied Sciences;2023-11-14