A Novel Syntactic-Based Approach to Calculate Similarities Among Languages-Reference-Cited by-同舟云学术

A Novel Syntactic-Based Approach to Calculate Similarities Among Languages

Published:2023-04-25 Issue:1 Volume:27 Page:125-136
ISSN:1308-6529
Container-title:Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi
language:
Short-container-title:

Author:

BİLGİN Metin¹^ORCID

Affiliation:

1. Bursa Uludağ Üniversitesi

Abstract

The approach presented in this study is about the calculation of the similarities among languages by using the new feature template to be obtained from the syntactic analysis phase. Studies were conducted on 6 different language sets from two different language families in order to show the calculability of similarity of languages with the help of the recommended new feature template. In the first study, triplet-pattern template which is obtained from the syntactic analysis of Turkey, Kazakh, and Uyghur Turkish languages from Turkic languages families belonging to the Ural-Altaic linguistic family, could be formed automatically through developed software, and also similarity analysis of the desired languages could be made thanks to a different module developed within the same software. Consequently, not only similar structural relations of the languages from the same language family but also structural differences among the languages can also be revealed. Even if the first aim is to determine the similarities among languages when developing an approach, the real aim of the desired structure is to form a system independent from the language. In order to show that the formed system has a structure independent from the language, another study was carried out. In the second study, the similarities among the languages were determined by using treebanks of English, Swedish and Norwegian from the Germen language family. When the language family is Turkic and the metrics are Jaccard, Dice, and Similarity Matching, the highest similarity is Turkish-Uyghur, and the values of the metrics are 25.21%, 40.27%, and 50.42%, respectively. When the language family is Germen, the highest similarity is Norwegian-Swedish, and the values of the metrics are 37.15%, 54.17%, and 74.3, respectively.

Publisher

SDU Journal of Natural and Applied Sciences

Subject

General Medicine

Reference46 articles.

1. [1] J.R. Searle, Indirect speech acts, in Speech Act, ed. P. Cole and J.L. Morgan (Academic Press, New York, 36 1975), p. 59-82.

2. [2] P.J. Taylor and S. Thomas, Linguistic style matching and negotiation outcome, Negotiation and Conflict Management Research 1(3) (2008) 263-281. https://doi.org/10.1111/j.1750-4716.2008.00016.x

3. [3] J.W. Pennebaker and L.D. Stone, Words of wisdom: language use over the life span, Journal of personality and social psychology 85(2) (2003) 291. https://doi.org/10.1037/0022-3514.85.2.291 [4] C.J. Groom and J.W. Pennebaker, The language of love: Sex, sexual orientation, and language use in online personal advertisements, Sex Roles 52 (2005) 447–461. https://doi.org/10.1007/s11199-005-3711-0

4. [5] C.M. Laserna, Y.T. Seih and J.W. Pennebaker, Um... who like says you know filler word use as a function of age, gender, and personality, Journal of Language and Social Psychology 33(3) (2014) 328-338. https://doi.org/10.1177/0261927X14526993

5. [6] M. Dehghani, K. Sagae, S. Sachdeva and J. Gratch, Analyzing political rhetoric in conservative and liberal weblogs related to the construction of the ground zero mosque, Journal of Information Technology Politics 11(1) (2014) 1–14. https://doi.org/10.1080/19331681.2013.826613