Affiliation:
1. IIT Kharagpur, Kharagpur, India
2. IIT Bombay, Mumbai, India
Abstract
The networked opinion diffusion in online social networks is often governed by the two genres of opinions—
endogenous
opinions that are driven by the influence of social contacts among users, and
exogenous
opinions which are formed by external effects like news and feeds. Accurate demarcation of endogenous and exogenous messages offers an important cue to opinion modeling, thereby enhancing its predictive performance. In this article, we design a suite of unsupervised classification methods based on experimental design approaches, in which, we aim to select the subsets of events which minimize different measures of mean estimation error. In more detail, we first show that these subset selection tasks are NP-Hard. Then we show that the associated objective functions are weakly submodular, which allows us to cast efficient approximation algorithms with guarantees. Finally, we validate the efficacy of our proposal on various real-world datasets crawled from Twitter as well as diverse synthetic datasets. Our experiments range from validating prediction performance on unsanitized and sanitized events to checking the effect of selecting optimal subsets of various sizes. Through various experiments, we have found that our method offers a significant improvement in accuracy in terms of opinion forecasting, against several competitors.
Publisher
Association for Computing Machinery (ACM)