Attention mechanism and skip-gram embedded phrases

Author:

Krimpas PanagiotisORCID,Valavani ChristinaORCID

Abstract

This article examines common translation errors that occur in the translation of legal texts. In particular, it focuses on how German texts containing legal terminology are rendered into Modern Greek by the Google translation machine. Our case study is the Google-assisted translation of the original (German) version of the Constitution of the Federal Republic of Germany into Modern Greek. A training method is proposed for phrase extraction based on the occurrence frequency, which goes through the Skip-gram algorithm to be then integrated into the Self Attention Mechanism proposed by Vaswani et al. (2017) in order to minimise human effort and contribute to the development of a robust machine translation system for multi-word legal terms and special phrases. This Neural Machine Translation approach aims at developing vectorised phrases from large corpora and process them for translation. The research direction is to increase the in-domain training data set and enrich the vector dimension with more information for legal concepts (domain specific features).

Publisher

Adam Mickiewicz University Poznan

Subject

Law,Linguistics and Language,Language and Linguistics

Reference27 articles.

1. Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2016. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015. arXiv:1409.0473v7 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.1409.0473.

2. Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5: 135–46. https://aclanthology.org/Q17-1010.pdf (accessed December 28, 2022).

3. Bouma, Gerlof. 2009. Normalized (Pointwise) Mutual information in collocation extraction. In From Form to Meaning: Processing Texts Automatically: Proceedings of the Biennial GSCL Conference 2009, eds. Christian Chiarcos, Richard Eckart de Castilho and Manfred Stede, 31–40. Tübingen: Gunter Narr.

4. Camacho-Collados, José, and Mohammad Taher Pilehvar. 2018. From word to sense embeddings: A survey on vector representations of meaning. Journal of Artificial Intelligence Research 63: 743–88. DOI: https://doi.org/10.1613/jair.1.11259.

5. Diniz da Costa, Alexandre, Mateus Coutinho Marim, Ely Edison da Silva Matos, and Tiago Timponi Torrent. 2022. Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet. In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), eds. Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk and Stelios Piperidis, 1–12. Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2022/LREC-2022.pdf (accessed December 28, 2022).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3