Exploratory Causal Analysis of Open Data: Explanation Generation and Confounder Identification
-
Published:2020-01-20
Issue:1
Volume:24
Page:142-155
-
ISSN:1883-8014
-
Container-title:Journal of Advanced Computational Intelligence and Intelligent Informatics
-
language:en
-
Short-container-title:JACIII
Author:
Song Jing,Oyama Satoshi,Kurihara Masahito, , ,
Abstract
Open data are becoming increasingly available in various domains, and many organizations rely on making decisions according to data. Such decision making requires care to distinguish between correlations and causal relationships. Among data analysis tasks, causal relationship analysis is especially complex because of unobserved confounders. For example, to correctly analyze the causal relationship between two variables, the possible confounding effect of a third variable should be considered. In the open-data environment, however, it is difficult to consider all possible confounders in advance. In this paper, we propose a framework for exploratory causal analysis of open data, in which possible confounding variables are collected and incrementally tested from a large volume of open data. To the extent of the authors’ knowledge, no framework has been proposed to incorporate data for possible confounders in causal analysis process. This paper shows an original way to expand causal structures and generate reasonable causal relationships. The proposed framework accounts for the effect of possible confounding in causal analysis by first using a crowdsourcing platform to collect explanations of the correlation between variables. Keywords are then extracted using natural language processing methods. The framework searches the related open data according to the extracted keywords. Finally, the collected explanations are tested using several automated causal analysis methods. We conducted experiments using open data from the World Bank and the Japanese government. The experimental results confirmed that the proposed framework enables causal analysis while considering the effects of possible confounders.
Publisher
Fuji Technology Press Ltd.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Human-Computer Interaction
Reference43 articles.
1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” In The Semantic Web, pp. 722-735, Springer, 2007. 2. C. Bizer, T. Heath, and T. Berners-Lee, “Linked data – the story so far,” Int. J. on Semantic Web and Information Systems, Vol.5, No.3, pp. 1-22, 2009. 3. C. Hartung, A. Lerer, Y. Anokwa, C. Tseng, W. Brunette, and G. Borriello, “Open data kit: tools to build information services for developing regions,” Proc. of the 4th ACM/IEEE Int. Conf. on Information and Communication Technologies and Development, Article No.18, 2010. 4. C. B. Davis, “Making sense of open data: from raw data to actionable insight,” Ph.D. Thesis, Delft University of Technology, 2012. 5. B.-N. Huang, M. J. Hwang, and C. W. Yang, “Causal relationship between energy consumption and GDP growth revisited: a dynamic panel data approach,” Ecological Economics, Vol.67, No.1, pp. 41-54, 2008.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Natural Language Generation System for Knowledge Acquisition Based on Patent Database;Journal of Advanced Computational Intelligence and Intelligent Informatics;2022-03-20
|
|