Affiliation:
1. Northwestern Polytechnical University, Xi’an, China
2. COMSATS University Islamabad, Lahore Campus, Pakistan
Abstract
The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference13 articles.
1. Sentiment analysis through recurrent variants latterly on convolutional neural network of twitter;Abid;Future Generation Computer Systems,2019
2. Gender, genre, and writing style in formal written texts;Argamon;Text-The Hague Then Amsterdam Then Berlin,2003
3. Deep learning;Bengio;Nature,2015
4. Enriching word vectors with subword information;Bojanowski;Transactions of the Association for Computational Linguistics,2017
5. Fatima M. , Hasan K. , Anwar S. and Muhammad Adeel Nawab R. , Multilingual author profiling on facebook, Information Processing and Management 53 07 (2017).
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献