Author:
Garabík Radovan,Šimková Mária
Abstract
Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate traditional analysis of Slovak morphology and part of speech categories. In particular, function words are classified according to their syntactic (and semantic) roles, which is a reason why the tagset is sometimes described as a morphosyntactic one.
Publisher
Institute of Computer Science, Polish Academy of Sciences
Subject
Computer Science Applications,Linguistics and Language,Modeling and Simulation
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献