Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest-Reference-Cited by-同舟云学术

Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest

Published:2021-06-05 Issue:1 Volume:8 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Prasetiyowati Maria Irmina^ORCID,Maulidevi Nur Ulfa,Surendro Kridanto

Abstract

AbstractFeature selection is a pre-processing technique used to remove unnecessary characteristics, and speed up the algorithm's work process. A part of the technique is carried out by calculating the information gain value of each dataset characteristic. Also, the determined threshold rate from the information gain value is used in feature selection. However, the threshold value is used freely or through a rate of 0.05. Therefore this study proposed the threshold rate determination using the information gain value’s standard deviation generated by each feature in the dataset. The threshold value determination was tested on 10 original datasets transformed by FFT and IFFT and classified using Random Forest. On processing the transformed dataset with the proposed threshold this study resulted in lower accuracy and longer execution time compared to the same process with Correlation-Base Feature Selection (CBF) and a standard 0.05 threshold method. Similarly, the required accuracy value is lower when using transformed features. The study showed that by processing the original dataset with a standard deviation threshold resulted in better feature selection accuracy of Random Forest classification. Furthermore, by using the transformed feature with the proposed threshold excluding the imaginary numbers leads to a faster average time than the three methods compared.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s40537-021-00472-4.pdf

Reference48 articles.

1. Khalid S, Khalil T, Nasreen S A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference, London, UK; 2014. p. 372–378. Doi: https://doi.org/10.1109/SAI.2014.6918213.

2. Hira ZM, Gillies DF. A Review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform. 2015;2015:1–13. https://doi.org/10.1155/2015/198363.

3. Corizzo R, Ceci M, Japkowicz N. Anomaly detection and repair for accurate predictions in geo-distributed big data. Big Data Res. 2019;16:18–35. https://doi.org/10.1016/j.bdr.2019.04.001.

4. Corizzo R, Ceci M, Zdravevski E, Japkowicz N. Scalable auto-encoders for gravitational waves detection from time series data. Expert Syst Appl. 2020;151:113378. https://doi.org/10.1016/j.eswa.2020.113378.

5. Zheng K, Li T, Zhang B, Zhang Y, Luo J, Zhou X. Incipient fault feature extraction of rolling bearings using autocorrelation function impulse harmonic to noise ratio index based SVD and teager energy operator. Appl Sci. 2017;7(11):1117. https://doi.org/10.3390/app7111117.

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of machine learning-based algorithms for classifying physical activity intensity using wrist and thigh worn wearables;2024-08-09

2. Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks;Proceedings of the 19th International Conference on Availability, Reliability and Security;2024-07-30

3. Advancing architectural frameworks for vibration signature classification in rotating machinery;Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture;2024-07-24

4. Identifying Key Learning Algorithm Parameter of Forward Feature Selection to Integrate with Ensemble Learning for Customer Churn Prediction;VFAST Transactions on Software Engineering;2024-06-11

5. Securing smart cities through machine learning: A honeypot‐driven approach to attack detection in Internet of Things ecosystems;IET Smart Cities;2024-05-29