Affiliation:
1. School of Journalism and Communication, Hunan Mass Media Vocational and Technical College, Changsha 410100, China
Abstract
The online individual behavior analysis is an important means for mining user interests. The user retweeting behavior prediction is typical problem for online individual behavior analysis. In order to make online learning behavior prediction method more suitable for the application of large-scale datasets, the improved condensed K nearest neighbor (ICKNN) method is proposed in this paper. Inspired by the idea of compressing samples in the condensed nearest neighbor (CNN) algorithm, this proposed method has adopted the Hadoop platform to parallelize the traditional CNN algorithm. For the traditional CNN method, as the value of K increases, the compression ratio decreases and so as the efficiency. The proposed ICKNN method can parallelize the traditional CNN method under the Hadoop framework to enhance efficiency. The proposed ICKNN method in this paper is validated by actual Twitter retweeting dataset. It can be seen that the proposed method in this paper has a higher compression rate than the traditional CNN algorithm. In terms of accuracy, the classification accuracy of the proposed ICKNN method has decreased compared with the traditional KNN method. However, the time consumed by the ICKNN method has significantly reduced compared with the traditional KNN method and CNN method, which can greatly improve the efficiency.
Subject
General Mathematics,General Medicine,General Neuroscience,General Computer Science
Reference33 articles.
1. Big Data: big gaps of knowledge in the field of internet science;C. Snijders;International journal of internet science,2012
2. Big Data for Internet of Things: A Survey
3. Using paraphrases for improving first story detection in news and Twitter;S. Petrović
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献