A new generic method to improve machine learning applications in official statistics

Author:

Kloos Kevin

Abstract

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

Publisher

IOS Press

Subject

Statistics, Probability and Uncertainty,Economics and Econometrics,Management Information Systems

Reference11 articles.

1. Friedman JH, Hastie T, Tibshirani R, et al. The elements of statistical learning. vol. 1. Springer, New York; 2001.

2. Schwarz JE. The neglected problem of measurement error in categorical data. Sociological Methods & Research. 1985.

3. Scholtus S, van Delden A. On the accuracy of estimators based on a binary classifier. 2020; 202006. Discussion Paper, Statistics Netherlands, The Hague.

4. Comparing correction methods to reduce misclassification bias;Kloos;Artificial Intelligence and Machine Learning. Cham: Springer International Publishing,2021

5. Characterizing concept drift;Webb;Data Mining and Knowledge Discovery,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3