Author:
Cabilitazan Angeli A.,Ingosan Jeffrey S.,Irabagon Jason,Jacinto Philip Irving G.,Lam Jan Jefferson O.
Abstract
Abstract
Kankana-ey is a widely used dialect in the northern region of the Philippines. Unfortunately, there are documented studies on the syntactic rules of this dialect. This study explored the development of a corpus for the Kankana-ey dialect. Further, the corpus was then used to establish the syntactic rules of Kankana-ey. A Kankana-ey version of the bible, dictionaries, news articles, songs and various online resources were used to collect words for the corpus of the Kankana-ey dialect. These identified words were also tagged using the parts of speech tags of the Penn TreeBank. Using the corpus and TensorFlow, 320 Kankana-ey sentences were analysed to determine the syntactic rules. In addition, 80 sentences were used to test the accuracy of the identified rules. At the end of the study, the created corpus has 3,412 tagged Kankana-ey words, while the analysis of the syntactic rules resulted to 1,722 rules. Testing also showed a 60% accuracy of the syntactic rules. In conclusion, the high number of identified rules from the 320 sentences was due to multiple Kankana-ey words having different possible tags. This also resulted to the low accuracy of the syntactic rules.