Methods and software for significant indicators determination of the natural language texts author profile


Shynkarenko V.I.ORCID, ,Demydovych I.M.ORCID,


Methods for the formation and optimization of author profiles are presented. The author profile is an image – a vector in a multidimensional space, which components are author’s texts measurements by a number of methods based on 4-grams, stemming, recurrence analysis and formal stochastic grammar. The author’s profile is a model of his language, including vocabulary, sentence syntax features. A comparative analysis of each of the methods effectiveness is carried out. By means of the genetic algorithm, a reduced profile of the author is formed. Insignificant indicators are excluded, which allows to reduce their number by 20%. The reduced author’s profile contains attributes that are significant for this author and is an effective attribution of a particular author.


National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka) (Publications)

Reference19 articles.

1. 1. H. Love. 2002. Attributing Authorship: An Introduction. Cambridge University Press.

2. 2. Aidan Finn and Nicholas Kushmerick. 2003. Learning to classify documents according to genre. In IJCAI-03 Workshop on Computational Approaches to Style Analysis and Synthesis.

3. 3. D. Khmelev and W. Teahan. 2003. A repetition based measure for verification of text collections and for text categorization. In SIGIR'2003, Toronto, Canada.

4. 4. M. Ephratt. 1997. Authorship attribution - the case of lexical innovations. In Proc. ACHALLC-97.

5. Computer-based authorship attribution without lexical measures;Stamatatos;Computers and the Humanities,2001







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3