Affiliation:
1. University of Technology Dresden, Dresden, Germany
Abstract
This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.
Subject
Decision Sciences (miscellaneous),Information Systems
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Novel Bio-Inspired Approach for Multilingual Spam Filtering;International Journal of Intelligent Information Technologies;2015-07
2. Streamlined Alarms for Intrusion Recognition System;International Journal of Intelligent Information Technologies;2015-04