Affiliation:
1. University of Tehran, Tehran, Iran
2. University of Tehran and Institute for Research in Foundation Science, Tehran, Iran
Abstract
Treebank is one of the important and useful resources in natural language processing represented in two different annotated schemas: phrase and dependency structures. There are many works that convert a phrase structure into a dependency structure and vice versa. Most of them are based that exploit the handcrafted head percolation table and argument table in predefined deterministic ways. In this article, we propose a method to convert a dependency structure into a phrase structure by enriching a trainable model of former hybrid strategy approach. By adding a classifier to the algorithm and using postprocessing modification, the quality of conversion is increased. We evaluate our method in two different languages, English and Persian, and then analyze the errors. The results of our experiments show a 46.01% reduction of error rate in English and 76.50% for Persian compared to our baseline. We build a new phrase structure treebank by converting 10,000 sentences of Persian dependency treebank into corresponding phrase structures and correcting them manually.
Funder
Iran National Science Foundation
Institute for Research in Fundamental Sciences
Publisher
Association for Computing Machinery (ACM)