Combining Transformer Embeddings with Linguistic Features for Complex Word Identification

Author:

Ortiz-Zambrano Jenny A.,Espin-Riofrio César,Montejo-Ráez ArturoORCID

Abstract

Identifying which words present in a text may be difficult to understand by common readers is a well-known subtask in text complexity analysis. The advent of deep language models has also established the new state-of-the-art in this task by means of end-to-end semi-supervised (pre-trained) and downstream training of, mainly, transformer-based neural networks. Nevertheless, the usefulness of traditional linguistic features in combination with neural encodings is worth exploring, as the computational cost needed for training and running such networks is becoming more and more relevant with energy-saving constraints. This study explores lexical complexity prediction (LCP) by combining pre-trained and adjusted transformer networks with different types of traditional linguistic features. We apply these features over classical machine learning classifiers. Our best results are obtained by applying Support Vector Machines on an English corpus in an LCP task solved as a regression problem. The results show that linguistic features can be useful in LCP tasks and may improve the performance of deep learning systems.

Funder

Andalusian Regional Government of Spain

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference35 articles.

1. Rico-Sulayes, A. (2020, January 23). General lexicon-based complex word identification extended with stem n-grams and morphological engines. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR-WS, Malaga, Spain.

2. Uluslu, A.Y. (2022). Automatic Lexical Simplification for Turkish. arXiv.

3. Shardlow, M., Cooper, M., and Zampieri, M. (2020, January 11). CompLex: A New Corpus for Lexical Complexity Predicition from Likert Scale Data. Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), Marseille, France.

4. The NLP cookbook: Modern recipes for transformer based deep learning architectures;Singh;IEEE Access,2021

5. Nandy, A., Adak, S., Halder, T., and Pokala, S.M. (2021, January 5–6). cs60075_team2 at SemEval-2021 Task 1: Lexical Complexity Prediction using Transformer-based Language Models pre-trained on various text corpora. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Online.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An effective deep learning based Idrcnn and Bdc-Lstm models for complex word identification and synonym generation;International Journal of Information Technology;2024-06-23

2. Floating-Point Embedding: Enhancing the Mathematical Comprehension of Large Language Models;Symmetry;2024-04-15

3. Systematic Literature Review of Transformer Model Implementations in Detecting Depression;2023 6th International Conference of Computer and Informatics Engineering (IC2IE);2023-09-14

4. Language Agnostic Readability Assessments;2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE);2023-07-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3