Affiliation:
1. Institute for Croatian Language
Abstract
Following the discontinuance of funding of the Croatian national term base � Struna (http://struna.ihjj.hr/) in 2019. we initiated the development of a new methodology for creating terminological collections independent of field experts' input for the initial terminological data. A possible solution to our problem of finding a compact and robust source for generating information in the early stages of processing terminology (the �raw definitions�) in various domains could be found in the publicly available AI language model created by OpenAI called ChatGPT-4. ChatGPT is a large language model whose functions include answering questions, text generation, and completing tasks such as translation and summarisation. A custom GPT is currently being developed that will be used as an assistance module, providing raw information for terminological units to be processed in Struna. Following the initial intensive testing of ChatGPT-4, we have started to develop and train a custom GPT bot (working name: TermAI). The first stage of training consisted of manually providing rules of good practices for terminology management, adapted from the original training of field experts. The second stage consists of feeding the TermAI with modified data exported from Struna. In this paper, we will present the results of the analysis of generated information from the new domain in comparison to the quality of information that was attained in the domain that TermAI was trained on, as well as information obtained from the actual field experts in the novelty domain.
Reference17 articles.
1. [1] B. Nahod, O umu strucnjaka. Zagreb: Institut za hrvatski jezik i jezikoslovlje, 2016.
2. [2] �OpenAI GPT-4.� [Online]. Available: https://openai.com/product/gpt-4.
3. [3] R. Noll, L. S. Frischen, M. Boeker, H. Storf, and J. Schaaf, �Machine translation11th SWS International Scientific Conference on Arts and Humanities ISCAH 2024 of standardised medical terminology using natural language processing: A scoping review,� N. Biotechnol., vol. 77, no. August, pp. 120�129, 2023, doi:
4. [4] C. A. Gao et al., �Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers,� npj Digit. Med., vol. 6, no. 1, pp. 1�5, 2023, doi: 10.1038/s41746-023-00819-6.
5. [5] Y. Ma et al., �AI vs. Human -- Differentiation Analysis of Scientific Content Generation,� no. January, 2023, doi: 10.48550/arXiv.2301.10416.