Learning Feature Engineering for Classification-Reference-Cited by-同舟云学术

Learning Feature Engineering for Classification

Published:2017-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Nargesian Fatemeh¹,Samulowitz Horst²,Khurana Udayan²,Khalil Elias B.³,Turaga Deepak²

Affiliation:

1. Department of Computer Science, University of Toronto

2. IBM Research

3. School of Computational Science and Engineering, Georgia Tech

Abstract

Feature engineering is the task of improving predictive modelling performance on a dataset by transforming its feature space. Existing approaches to automate this process rely on either transformed feature space exploration through evaluation-guided search, or explicit expansion of datasets with all transformed features followed by feature selection. Such approaches incur high computational costs in runtime and/or memory. We present a novel technique, called Learning Feature Engineering (LFE), for automating feature engineering in classification tasks. LFE is based on learning the effectiveness of applying a transformation (e.g., arithmetic or aggregate operators) on numerical features, from past feature engineering experiences. Given a new dataset, LFE recommends a set of useful transformations to be applied on features without relying on model evaluation or explicit feature expansion and selection. Using a collection of datasets, we train a set of neural networks, which aim at predicting the transformation that impacts classification performance positively. Our empirical results show that LFE outperforms other feature engineering approaches for an overwhelming majority (89%) of the datasets from various sources while incurring a substantially lower computational cost.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 108 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sustainability and predictive accuracy evaluation of gel and embroidered electrodes for ECG monitoring;Biomedical Signal Processing and Control;2024-10

2. Enhanced Particle Classification in Water Cherenkov Detectors Using Machine Learning: Modeling and Validation with Monte Carlo Simulation Datasets;Atmosphere;2024-08-28

3. A comprehensive review of cyberbullying-related content classification in online social media;Expert Systems with Applications;2024-06

4. FeatureLTE: Learning to Estimate Feature Importance;Proceedings of the ACM on Management of Data;2024-05-29

5. Effective interpretable learning for large-scale categorical data;Data Mining and Knowledge Discovery;2024-05-27