Author:
Araujo João Victor,Santos Gean da Silva,Aquino Andre L. L.,Queiroz Fabiane
Abstract
Regression problems are Machine Learning (ML) tasks often found in real world, with many attributes being categorical. Most ML algorithms works only with numerical data, so encoding these attributes tends to be necessary, but common encoding methods don’t use data properties, which can lead to poor model performance on high cardinality data. Target Encoding methods address this, but encode each attribute into a discrete set of values of equal cardinality to the categorical attribute. We propose a Target Encoder that addresses both issues introducing variability to encoded data using target statistics, achieving results comparable with the existing Target Encoders. We test our method against existing Encoders, showing the robust performance of our method.
Publisher
Sociedade Brasileira de Computação - SBC