Author:
Ghazali Shamihah Muhammad,Shaadan Norshahida,Idrus Zainura
Abstract
Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set need to be treated or replaced using imputation method. Thus, exploring missing values pattern must be conducted beforehand to determine a suitable method. This paper discusses on the application of data visualisation as a smart technique for missing data exploration aiming to increase understanding on missing data behaviour which include missing data mechanism (MCAR, MAR and MNAR), distribution pattern of missingness in terms of percentage as well as the gap size. This paper presents the application of several data visualisation tools from five R-packges such as visdat, VIM, ggplot2, Amelia and UpSetR for data missingness exploration. For an illustration, based on an air quality data set in Malaysia, several graphics were produced and discussed to illustrate the contribution of the visualisation tools in providing input and the insight on the pattern of data missingness. Based on the results, it is shown that missing values in air quality data set of the chosen sites in Malaysia behave as missing at random (MAR) with small percentage of missingness and do contain long gap size of missingness.
Publisher
Institute of Advanced Engineering and Science
Subject
Electrical and Electronic Engineering,Control and Optimization,Computer Networks and Communications,Hardware and Architecture,Instrumentation,Information Systems,Control and Systems Engineering,Computer Science (miscellaneous)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献