Affiliation:
1. School of Accountancy, Central University of Finance and Economics, Beijing 100081 China;
2. School of Computing, National University of Singapore, 117417 Singapore
Abstract
Data have never been more essential to the success of decision making. However, data are often messy. A perennial data challenge is missing values, which frequently occur in real-world data, such as unreported data items in public firms’ financial statements and skipped product ratings from consumers. What is the influence of missing values and how should they be handled? Although we are in a big data era, missing values are not ignorable if data are missing for nonrandom reasons. In the case of product ratings, if only people who favor the product provide ratings while others put aside the product and do not respond, then even a simple mean estimation of the product rating would be significantly biased. Such bias challenges the validity of data analysis, and it cannot be eliminated simply by increasing the sample size of the data. To correct the bias arising from nonrandom missing values, it is necessary to examine and model what causes the missing values. We propose and demonstrate the superior performance of a Monte Carlo likelihood approach to correct the bias. Overall, we recommend well-designed data collection processes with documentation of the possible reasons for missing values, cautious adoption of missing value handling methods, and structured missing value reporting practices.
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Subject
Library and Information Sciences,Information Systems and Management,Computer Networks and Communications,Information Systems,Management Information Systems
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献