War of the benchmark means

Author:

Mashey John R.1

Affiliation:

1. Techviser

Abstract

For decades, computer benchmarkers have fought a War of Means. Although many have raised concerns with the geometric mean (GM), it continues to be used by SPEC and others. This war is an unnecessarymisunderstanding due to inadequately articulated implicit assumptions, plus confusio namong populations, their parameters, sampling methods, and sample statistics. In fact, all the Means have their uses, sometimes in combination. Metrics may be algebraically correct, but statistically irrelevant or misleading if applied to population distributions for which they are inappropriate. Normal (Gaussian) distributions are so useful that they are often assumed without question,but many important distributions are not normal.They require different analyses, most commonly by finding a mathematical transformations that yields a normal distribution,computing the metrics, and then back-transforming to the original scale. Consider the distribution of relative performance ratios of programs on two computers. The normal distribution is a good fit only when variance and skew are small, but otherwise generates logical impossibilities and misleading statistical measures. A much better choice is the lognormal (or log-normal) distribution, not just on theoretical grounds, but through the (necessary) validation with real data. Normal and lognormal distributions are similar for low variance and skew, but the lognormal handles skewed distributions reasonably, unlike the normal. Lognormal distributions occur frequently elsewhere are well-understood, and have standard methods of analysis.Everyone agrees that "Performance is not a single number," ... and then argues about which number is better. It is more important to understanding populations, appropriate methods, and proper ways to convey uncertainty. When population parameters are estimated via samples, statistically correct methods must be used to produce the appropriate means, measures of dispersion, Skew, confidence levels, and perhaps goodness-of-fit estimators. If the wrong Mean is chosen, it is difficult to achieve much. The GM predicts the mean relative performance of programs , not of workloads. The usual GM formula is rather unintuitive, and is often claimed to have no physical meaning. However, it is the back-transformed average of a lognormal distribution , as can be seen by the mathematical identity below. Its use is not onlystatistically appropriate in some cases, but enables straightforward computation of other useful statistics.<display equation>"If a man will begin in certainties, he shall end in doubts, but if he will be content to begin with doubts, he shall end with certainties."  — Francis Bacon, in Savage.

Publisher

Association for Computing Machinery (ACM)

Reference29 articles.

1. Savage S "Some Gratuitous Inflammatory Remarks on the Accounting Industry " http://www.stanford.edu/dept/MSandE/faculty/savage/AccountingRemarks.pdf Savage S "Some Gratuitous Inflammatory Remarks on the Accounting Industry " http://www.stanford.edu/dept/MSandE/faculty/savage/AccountingRemarks.pdf

2. How not to lie with statistics: the correct way to summarize benchmark results

3. Characterizing computer performance with a single number

4. More on finding a single number to indicate overall performance of a benchmark suite

5. Jain R. The Art of Computer Systems Performance Analysis John Wiley and Sons New York 1991. Jain R. The Art of Computer Systems Performance Analysis John Wiley and Sons New York 1991.

Cited by 23 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3