Abstract
AbstractThis research explores the effectiveness of machine translation from Slovak to English for sentiment analysis, specifically focusing on the translation of movie subtitles. The study employs a parallel corpus of segmented movie subtitles in both languages and utilizes IBM Watson™ Natural Language Understanding service and Google Translate. The research aims to assess the correlation between human-generated text and machine-translated text in sentiment analysis. A comparative analysis was also conducted using OpenAI to evaluate the sentiment of the Slovak text directly, without translation into English. The findings reveal a strong correlation between human text and machine translation, with a Pearson correlation coefficient of 0.86, and a correlation with OpenAI’s GPT model evaluation at 0.72. Despite the relatively high accuracy of the end-to-end solution using OpenAI, the methodology comprising machine translation followed by sentiment analysis in English was found to be significantly more precise. The research further investigates the challenges in translating specific language nuances, such as humor and vulgarism, and their impact on sentiment analysis. The study concludes that machine translation can be effectively used for sentiment analysis in Slovak, a flective language, and highlights the potential of advanced language models in low-resource languages. Future research directions include expanding the study to other text types and comparable languages beyond Slovak.
Funder
Agentúra na Podporu Výskumu a Vývoja
Constantine the Philosopher University in Nitra
Publisher
Springer Science and Business Media LLC
Reference32 articles.
1. Abdaoui, A., Azé, J., Bringay, S., & Poncelet, P. (2017). FEEL: A French expanded emotion lexicon. Language Resources and Evaluation, 51(3), 833–855. https://doi.org/10.1007/s10579-016-9364-5
2. Afli, H., Mcguire, S., & Way, A. (2017). Sentiment Translation for low resourced languages: Experiments on Irish General Election Tweets. 18th International Conference on Computational Linguistics and Intelligent Text Processing. http://indigenoustweets.com/
3. Araújo, M., Pereira, A., & Benevenuto, F. (2020). A comparative study of machine translation for multilingual sentence-level sentiment analysis. Information Sciences, 512, 1078–1102. https://doi.org/10.1016/J.INS.2019.10.031
4. Biber, D., Conrad, S., & Leech, G. N. (2002). Longman student grammar of spoken and written English. 487.
5. Carvalho, A., Levitt, A., Levitt, S., Khaddam, E., & Benamati, J. (2019). Off-the-shelf artificial intelligence technologies for sentiment and emotion analysis: A tutorial on using IBM natural language processing. Communications of the Association for Information Systems, 44(1), 918–943. https://doi.org/10.17705/1CAIS.04443