Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

Author:

Uematsu Sumire1,Matsuzaki Takuya2,Hanaoka Hiroki1,Miyao Yusuke2,Mima Hideki1

Affiliation:

1. The University of Tokyo

2. National Institute of Informatics

Abstract

A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference32 articles.

1. Daisuke Bekki. 2010. Formal Theory of Japanese Syntax. Kuroshio Shuppan (in Japanese). Daisuke Bekki. 2010. Formal Theory of Japanese Syntax . Kuroshio Shuppan (in Japanese).

2. Wide-coverage semantic representations from a CCG parser

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Reducing Syntactic Complexity for Information Extraction from Japanese Requirement Specifications;2022 29th Asia-Pacific Software Engineering Conference (APSEC);2022-12

2. Translate Japanese into Formal Languages with an Enhanced Generalization Algorithm;Advances in Intelligent Systems and Computing;2020

3. Hindi CCGbank: A CCG treebank from the Hindi dependency treebank;Language Resources and Evaluation;2017-01-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3