Mining and visualising information from RSS feeds: a case study
Author:
O'Shea Martin,Levene Mark
Abstract
PurposeRecent years have seen “really simple syndication” or “rich site summary”(RSS) syndication of frequently updated content become ubiquitous across the internet. RSS's XML‐based format allows these data to be stored in a semi‐structured format but, despite the presence of online aggregators and readers, and the related work in clustering feeds and mining subjects by keywords, much potentially useful information present in RSS may remain undiscovered. This paper aims to address this issue in an experimental setting.Design/methodology/approachThis paper presents two distinct technologies which employ the semi‐structured nature of RSS content to allow users to mine information directly from raw RSS feeds: occurrence mining counts occurrences of text strings in feeds, whilst value mining mines structured ticker tape numeric data. It describes both technologies and their implementation in an experiment, where 35 students mined small numbers of RSS feeds and visualised the data mined.FindingsThis paper analyses the results of the experiment and cites examples of data mined and visualisations produced. The subject matter of data mined is also explored and potential applications of the technologies are considered.Research limitations/implicationsThe mining technologies proposed in this paper have been developed to mine textual and numeric data directly from feeds, but can be extended to mine other data types present in RSS and to include other variants like Atom.Originality/valueThese technologies are seen to be applicable to data mining, the role of data and visualisations in social data analysis, issue tracking in news mining and time series analysis.
Subject
Computer Networks and Communications,Information Systems
Reference16 articles.
1. Ali, M.S., Consens, M.P. and Rizzolo, F. (2007), “Visualizing structural patterns in web collections”, WWW '07: Proceedings of the 16th International Conference on World Wide Web in Banff, Alberta, Canada, 2007, ACM, New York, NY, pp. 1333‐4. 2. Bray, T., Paoli, J., Sperberg‐McQueen, C.M., Maler, E. and Yeargeau, F. (2004), Extensible Markup Language (XML) 1.0, W3C Recommendation, 3rd ed., available at: www.w3.org/TR//REC‐xml‐20040204/. 3. Büchner, A.G., Mulvenna, M.D., Anand, S.S., Baumgarten, M. and Böhm, R. (2000), “Data mining and XML: current and future issues”, WISE '00: Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00), Volume 2, IEEE Computer Society, Washington, DC, p. 2131. 4. Chen, Y.R., Fabbrizio, G., Gibbon, D., Jora, S., Renger, B. and Wei, B. (2007), “Geotracker: geospatial and temporal RSS navigation”, WWW '07: Proceedings of the 16th international conference on World Wide Web in Banff, Alberta, Canada, 2007, ACM, New York, NY, pp. 41‐50. 5. Getahun, F., Tekli, J., Chbeir, R., Viviani, M. and Yetongnon, K. (2009), “Relating RSS news/items”, ICWE '9: Proceedings of the 9th International Conference on Web Engineering in San Sebástian, Spain, 2009, Springer‐Verlag, Berlin, pp. 442‐52.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|