Abstract
In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.
Subject
Information Systems and Management,Computer Science Applications,Information Systems
Reference28 articles.
1. Irreversibility and Heat Generation in the Computing Process
2. Information Loss in Binomial Data Due to Data Compression
3. Statistical Inference;Casella,2002
4. All Likelihood: Statistical Modeling and Inference Using Likelihood;Pawitan,2013
5. An Introduction to Probability and Statistics;Rohatgi,2001
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. CausIL: Causal Graph for Instance Level Microservice Data;Proceedings of the ACM Web Conference 2023;2023-04-30