Affiliation:
1. School of Science and Technology, University of the Thai Chamber of Commerce, Bangkok, Thailand
Abstract
The recent emergence of web 2.0 technologies and rich internet applications is driving the development of a new class of applications that combines data from diverse sources which we refer to as “mash-ups.” One of the most popular mash-ups comes in the form of web feed mash-ups relying on syndication technologies such as RSS and Atom. This kind of mash-ups aggregates web feeds derived from multiple news websites or blogs and then timely presents them in a single interface. In such systems, it is difficult to know exactly how feed results in data mash-ups are generated. In particular, it is difficult for users to make determinations about whether information is trusted. Therefore, it is necessary that web feed mash-ups have to support a mechanism that is capable of recording and querying provenance information - the information about the process that led to result data. In this paper, the author proposes a provenance tracking solution that enables provenance functionality to be facilitated in web feed mash-ups. He demonstrates how the provenance of feed mash-up results to be determined by means of a provenance query algorithm. To tackle the storage problem resulting from the persistence of intermediate web feeds, a novel storage optimization method is introduced. Finally, the author evaluates his provenance solution in terms of storage consumption for provenance collection, demonstrating significant reductions in storage size and achieving reasonable storage overheads.