Abstract
AbstractOccupation profiling is a subtask of authorship profiling that is broadly defined as an analysis of individuals’ writing styles. Although the problem has been widely explored, no previous studies have attempted to identify Chinese classical poetry. Inspired by Trudgill’s seminal work on stylistic variation as a function of occupation, we present a novel Domain-Knowledge Transformer model to predict a poet’s occupation through their poems’ writing styles. Different from other Indo-European languages, Chinese has rarely used characters and two types of writing forms: traditional Chinese and simplified Chinese. To tackle these problems, we use the language-related component to standardize our input. We also use alphabetization to satisfy the restrictions on rhyming rules and tonal styles. As a special literal form, traditional domain knowledge, for example, named entities, themes, ages and the official career path, is valuable for poet occupation profiling. However, due to the lack of appropriate annotation datasets, it is difficult to recognize these features. Therefore, we proposed the domain knowledge component employing the latent Dirichletal location model to capture the extra theme information and establish named entity dictionaries to recognize the named entity of the datasets in this study. Finally, in the deep learning component, we combine Transformer with a convolutional neural network (CNN) model to perform occupation profiling. The experimental results suggest that our model is effective in this task. Moreover, the results demonstrate an account of other social attribution features of poetry style that are predictive of occupation in this domain.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,General Computer Science
Reference39 articles.
1. Chambers, J. K., Trudgill Peter.: Dialectology. Cambridge, London (1980)
2. CioffiRevilla, C.: Introduction to Computational Social Science: Principles and Applications. Springer-Verlag, Berlin (2014)
3. Johannsen, A., Hovy, D., Søgaard, A.: Cross-lingual syntactic variation over age and gender. International Conference on Computational Natural Language Learning ACL (2015)
4. Sari, Y., Stevenson, M., Vlachos, A.: Topic or Style? Exploring the Most Useful Features for Authorship Attribution. International Conference on Computational Linguistics ACL (2018)
5. Peersman, C., Daelemans, W., Van Vaerenbergh L.: Predicting age and gender in online socialnetworks. International Workshop on Search and Mining User-generated Contents ACM (2011)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献