Affiliation:
1. Data61, CSIRO
2. Data61, CSIRO 8 University of New South Wales
3. Department of CS, University of California, Davis, CA
Abstract
The increased popularity of smartphones has attracted a large number of developers to offer various applications for the different smartphone platforms via the respective app markets. One consequence of this popularity is that the app markets are also becoming populated with spam apps. These spam apps reduce the users’ quality of experience and increase the workload of app market operators to identify these apps and remove them. Spam apps can come in many forms such as apps not having a specific functionality, those having unrelated app descriptions or unrelated keywords, or similar apps being made available several times and across diverse categories. Market operators maintain antispam policies and apps are removed through continuous monitoring. Through a systematic crawl of a popular app market and by identifying apps that were removed over a period of time, we propose a method to detect spam apps solely using app metadata available at the time of publication. We first propose a methodology to manually label a sample of removed apps, according to a set of checkpoint heuristics that reveal the reasons behind removal. This analysis suggests that approximately 35% of the apps being removed are very likely to be spam apps. We then map the identified heuristics to several quantifiable features and show how distinguishing these features are for spam apps. We build an
Adaptive Boost
classifier for early identification of spam apps using only the metadata of the apps. Our classifier achieves an accuracy of over 95% with precision varying between 85% and 95% and recall varying between 38% and 98%. We further show that a limited number of features, in the range of 10--30, generated from app metadata is sufficient to achieve a satisfactory level of performance. On a set of 180,627 apps that were present at the app market during our crawl, our classifier predicts 2.7% of the apps as potential spam. Finally, we perform additional manual verification and show that human reviewers agree with 82% of our classifier predictions.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Reference82 articles.
1. AppBrain Inc. 2016. New Android apps per month. Retrieved from http://www.appbrain.com/stats/number-of-android-apps. AppBrain Inc. 2016. New Android apps per month. Retrieved from http://www.appbrain.com/stats/number-of-android-apps.
2. Apple. 2014. Common App Rejections. Retrieved from https://developer.apple.com/app-store/review/rejections/. Apple. 2014. Common App Rejections. Retrieved from https://developer.apple.com/app-store/review/rejections/.
3. Apple. 2016. App Store Review Guidelines. Retrieved from https://developer.apple.com/app-store/review/guidelines/. Apple. 2016. App Store Review Guidelines. Retrieved from https://developer.apple.com/app-store/review/guidelines/.
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献