Affiliation:
1. School of Physics and Electronic Science Zunyi Normal University Zunyi China
2. Key Laboratory of Big Data Intelligent Computing Chongqing University of Posts and Telecommunications Chongqing China
3. Department of Computer Science Old Dominion University Norfolk Virginia USA
4. College of Computer and Information Science Southwest University Chongqing China
Abstract
AbstractOnline streaming feature selection (OSFS), as an online learning manner to handle streaming features, is critical in addressing high‐dimensional data. In real big data‐related applications, the patterns and distributions of streaming features constantly change over time due to dynamic data generation environments. However, existing OSFS methods rely on presented and fixed hyperparameters, which undoubtedly lead to poor selection performance when encountering dynamic features. To make up for the existing shortcomings, the authors propose a novel OSFS algorithm based on vague set, named OSFS‐Vague. Its main idea is to combine uncertainty and three‐way decision theories to improve feature selection from the traditional dichotomous method to the trichotomous method. OSFS‐Vague also improves the calculation method of correlation between features and labels. Moreover, OSFS‐Vague uses the distance correlation coefficient to classify streaming features into relevant features, weakly redundant features, and redundant features. Finally, the relevant features and weakly redundant features are filtered for an optimal feature set. To evaluate the proposed OSFS‐Vague, extensive empirical experiments have been conducted on 11 datasets. The results demonstrate that OSFS‐Vague outperforms six state‐of‐the‐art OSFS algorithms in terms of selection accuracy and computational efficiency.
Funder
National Natural Science Foundation of China
Department of Education of Guizhou Province
Science and Technology Foundation of State Grid Corporation of China
Publisher
Institution of Engineering and Technology (IET)