Affiliation:
1. Department of General Linguistics, Palacký University , Olomouc, 779 00, Czech Republic
Abstract
Abstract
Our work aims to evaluate the strength of the association between function words and several text types: novels, poems, academic articles, reviews, and blog posts, and the accuracy of their classification to these categories, through machine-learning and statistical methods. The principal conclusion is that the types of texts are distinguishable based only on the function words, either by vocabulary or vocabulary diversity. Such findings may impact the techniques of authorship attribution based on function words and text clustering techniques since some function words add information about the text types/genres, in addition to content words.
Funder
Czech Ministry of Education, Youth and Sports
Publisher
Oxford University Press (OUP)