Categorical Embeddings for Tabular Data using PyTorch-Reference-Cited by-同舟云学术

Categorical Embeddings for Tabular Data using PyTorch

Published:2023 Issue: Volume:56 Page:02002
ISSN:2271-2097
Container-title:ITM Web of Conferences
language:
Short-container-title:ITM Web Conf.

Author:

Khedkar Sanskruti,Lambor Shilpa,Narule Yogita,Berad Prathamesh

Abstract

Deep learning has received much attention for computer vision and natural language processing, but less for tabular data, which is the most prevalent type of data used in industry. Embeddings offer a solution by representing categorical variables as continuous vectors in lowdimensional space. PyTorch provides excellent support for GPU acceleration and pre-built functions and modules, making it easier to work with embeddings and categorical variables. In this research paper, we apply a feedforward neural network model in PyTorch to a multiclass classification problem using the Shelter Animal Outcome dataset. We calculate the probability of an animal's outcome belonging to each of the 5 categories. Additionally, we explore feature importance using two common techniques: MDI and permutation. Understanding feature importance is crucial for building better models, improving performance, and interpreting and communicating results. Our findings demonstrate the usefulness of embeddings and PyTorch for deep learning with tabular data and highlight the importance of feature selection for building effective machine learning models.

Publisher

EDP Sciences

Subject

General Medicine

Link

https://www.itm-conferences.org/10.1051/itmconf/20235602002/pdf

Reference10 articles.

1. A Deep-Learned Embedding Technique for Categorical Features Encoding

2. De Meulemeester H. and De Moor B., “Unsupervised Embeddings for Categorical Variables,” 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1-8, DOI: 10.1109/IJCNN48605.2020.9207703.

3. Mitrović K., Milošević D. and Greconici M., “Comparison of Machine Learning Algorithms for Shelter Animal Classification,” 2019 IEEE 13 th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 2019, pp. 211-216, DOI: 10.1109/SACI46893.2019.9111575.

4. Joseph Manu. “Pytorch tabular: A framework for deep learning with tabular data.” arXiv preprint arXiv:2104.13638 (2021).

5. Zhao Qian, Shi Yue, and Hong Liangjie. 2017. GBCENT: Gradient Boosted Categorical Embedding and Numerical Trees. In Proceedings of the 26th International Conference on World Wide Web (WWW ‘17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1311-1319.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on the Application of Chemical Process Fault Diagnosis Methods Based on Neural Network;Proceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology;2024-01-19