Methodology for detecting and removing outliers in statistical studies

Author:

Sidnyaev N. I.1,Enkhzhargal B.1

Affiliation:

1. Bauman Moscow State Technical University

Abstract

The paper presents a calculation method for detecting and eliminating outlying values. It is shown that its effectiveness depends on the amount of a priori information on the examined process. The proposed method is used for cases whereas the process is stationary and has a Gaussian probability density law. When analysing non-stationary random processes, the existing methods and algorithms rely on the fact that the outlying component is additive and the characteristics of the outlying values are known a priori. The work used the statistical decisions theory that allows formalising the verification algorithms and selecting a criterion for detecting outlying values. Both parametric and non-parametric methods were proposed. In the first case, it is required to have a priori information both on the function of the useful component and on the distribution law of the outlying component of the process, as well as its parameters. It is postulated that the use of non-parametric processing methods requires significantly less a priori information, but their effectiveness is defined by the processing parameters that, in turn, depend on the function of the useful and the distribution law of the outlying components of the process. It is noted that an outlier may prove to be one of the extreme values of the probability distribution of a random variable. The authors outline the problems of ambiguity of input data in case of classical computing. The paper examines the way the external factors affect the dependability and the degree to which such factors are taken into consideration in the existing methods. Methods for assessing the life of the examined items are presented, among which control chart-based methods hold a prominent place. It is shown that the range proves to be a more convenient measure for data dispersion calculation than the standard deviation. Plotting the range of sample on a control chart along with the expectation makes it easier to notice an anomaly. The range is a rough measure of the rate of change of the monitored variable and its value may exceed the control limits on the range chart and inform of an anomaly much earlier than the change in the mean that may still be within the specified control limits.

Publisher

Journal Dependability

Reference12 articles.

1. Gnedenko B.V., Beliaev Yu.K., Soloviev A.D. [Mathematical methods in the dependability theory]. Moscow: Nauka; 1965. (in Russ.)

2. Sidnyaev N.I. [Experimental design theory and statistical data analysis: a study guide]. Moscow: ID Yurayt; 2011. (in Russ.)

3. Morozov D.V., Chermoshentsev S.F. Method of improving the functional dependability of the control systems of an unmanned aerial vehicle in flight in case of failure in the onboard test instrumentation. Dependability 2019;19(1):30-35.

4. Sidnyaev N.I., Sadykhov G.S., Savchenko V.P. [Models and methods of estimation of the residual operating life of electronics]. Moscow: Bauman MSTU Publishing; 2015. (in Russ.)

5. Mоrris S.F. Use and application of MIL-HDBK-217. Solid Slate Technology 1990;33(6):65-69.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3