A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning-Reference-Cited by-同舟云学术

A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning

Published:2015 Issue: Volume:2015 Page:1-11
ISSN:1024-123X
Container-title:Mathematical Problems in Engineering
language:en
Short-container-title:Mathematical Problems in Engineering

Author:

Fong Simon¹^ORCID,Biuk-Aghai Robert P.¹,Si Yain-whar¹,Yap Bee Wah²

Affiliation:

1. Department of Computer and Information Science, University of Macau, Macau

2. Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia

Abstract

A prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include, but are not limited to, feature selection, dimensionality reduction, and the removal of noise from training data. One limitation common to all of these techniques is the assumption that the full training dataset must be applied. Although this has been effective for traditional batch training, it may not be practical for incremental classifier learning, also known as data stream mining, where only a single pass of the data stream is seen at a time. Because data streams can amount to infinity and the so-called big data phenomenon, the data preprocessing time must be kept to a minimum. This paper introduces a new data preprocessing strategy suitable for the progressive purging of noisy data from the training dataset without the need to process the whole dataset at one time. This strategy is shown via a computer simulation to provide the significant benefit of allowing for the dynamic removal of bad records from the incremental classifier learning process.

Funder

University of Macau

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Link

http://downloads.hindawi.com/journals/mpe/2015/125781.pdf

Reference17 articles.

1. Class Noise vs. Attribute Noise: A Quantitative Study

2. A Survey of Outlier Detection Methodologies

3. Outlier Detection in Adaptive Functional-Coefficient Autoregressive Models Based on Extreme Value Theory

4. The Outlier Interval Detection Algorithms on Astronautical Time Series Data

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Survey on Event Detection and Prediction Online and Offline Models using Social Media Platforms;Materials Today: Proceedings;2021-03

2. Influence of pelvic padding and Kinesiology Taping on pain perception, kinematics, and kinetics of falls in female volleyball athletes;Gait & Posture;2018-07

3. Evaluation of Dataset Metamodel for Describing the Structure of Datasets;FRONT ARTIF INTEL AP;2018

4. Mining and visualising contradictory data;Journal of Big Data;2017-10-30