Abstract
Methods for the formation and optimization of author profiles are presented. The author profile is an image – a vector in a multidimensional space, which components are author’s texts measurements by a number of methods based on 4-grams, stemming, recurrence analysis and formal stochastic grammar. The author’s profile is a model of his language, including vocabulary, sentence syntax features. A comparative analysis of each of the methods effectiveness is carried out. By means of the genetic algorithm, a reduced profile of the author is formed. Insignificant indicators are excluded, which allows to reduce their number by 20%. The reduced author’s profile contains attributes that are significant for this author and is an effective attribution of a particular author.
Publisher
National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka) (Publications)
Reference19 articles.
1. 1. H. Love. 2002. Attributing Authorship: An Introduction. Cambridge University Press.
2. 2. Aidan Finn and Nicholas Kushmerick. 2003. Learning to classify documents according to genre. In IJCAI-03 Workshop on Computational Approaches to Style Analysis and Synthesis.
3. 3. D. Khmelev and W. Teahan. 2003. A repetition based measure for verification of text collections and for text categorization. In SIGIR'2003, Toronto, Canada.
4. 4. M. Ephratt. 1997. Authorship attribution - the case of lexical innovations. In Proc. ACHALLC-97.
5. Computer-based authorship attribution without lexical measures;Stamatatos;Computers and the Humanities,2001