Abstract
AbstractKeyphrase extraction is a subtask of natural language processing referring to the automatic extraction of salient terms that semantically capture the key themes and topics of a document. Earlier literature reviews focus on classical approaches that employ various statistical or graph-based techniques; these approaches miss important keywords/keyphrases, due to their inability to fully utilize context (that is present or not) in a document, thus achieving low F1 scores. Recent advances in deep learning and word/sentence embedding vectors lead to the development of new approaches, which address the lack of context and outperform the majority of classical ones. Taking the above into account, the contribution of this review is fourfold: (i) we analyze the state-of-the-art keyphrase extraction approaches and categorize them upon their employed techniques; (ii) we provide a comparative evaluation of these approaches, using well-known datasets of the literature and popular evaluation metrics, such as the F1 score; (iii) we provide a series of insights on various keyphrase extraction issues, including alternative approaches and future research directions; (iv) we make the datasets and code used in our experiments public, aiming to further increase the reproducibility of this work and facilitate future research in the field.
Publisher
Springer Science and Business Media LLC
Reference76 articles.
1. Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. Coling 2010: Posters:365–373. https://aclanthology.org/C10-2042
2. Wan X, Xiao J, (2008) Single document keyphrase extraction using neighborhood knowledge. Proceedings of the 23rd National Conference on Artificial Intelligence 2:855–860
3. Hammouda KM, Matute DN, Kamel MS (2005) CorePhrase: keyphrase extraction for document clustering. In Perner P, Imiya A (eds) Machine Learning and Data Mining in Pattern Recognition, Springer, Berlin, Heidelberg, pp. 265–274. https://doi.org/10.1007/11510888_26
4. Hasan KS, Ng V (2014) Automatic Keyphrase extraction: a survey of the state of the art. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers):1262–1273. https://doi.org/10.3115/v1/P14-1119
5. Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: issues and methods. Nat Lang Eng 26(3):259–291. https://doi.org/10.1017/S1351324919000457