Abstract
AbstractBackgroundTwitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical (“Bio”) content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors.ResultPerformance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90.ConclusionIn this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.
Funder
Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference43 articles.
1. Han B, Cook P, Baldwin, T. A stacking-based approach to twitter user geolocation prediction. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations. 2013. p. 7–12
2. Miller Z, Dickinson B, Hu W. Gender prediction on twitter using stream algorithms with n-gram character features. 2012.
3. Preoţiuc-Pietro D, Lampos V, Aletras N. An analysis of the user occupational class through twitter content. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), 2015. p. 1754–64.
4. Huang Y, Yu L, Wang X, Cui B. A multi-source integration framework for user occupation inference in social media systems. World Wide Web. 2015;18(5):1247–67.
5. Aletras N, Chamberlain BP. Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on hypertext and social media. 2018. p. 20–4
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献