Abstract
Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-theart models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperforms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.
Original language | English |
---|---|
Title of host publication | Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods |
Editors | Modesto Castrillon-Santana, Maria De Marsico, Ana Fred |
Publisher | Science and Technology Publications, Lda |
Pages | 17-26 |
Number of pages | 10 |
Volume | 1 |
ISBN (Print) | 9789897586842 |
DOIs | |
Publication status | Published - 2024 |
Event | 13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024 - Rome, Italy Duration: 24 Feb 2024 → 26 Feb 2024 Conference number: 13 https://icpram.scitevents.org/NeroPRAI.aspx?y=2024 |
Conference
Conference | 13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024 |
---|---|
Abbreviated title | ICPRAM 2024 |
Country/Territory | Italy |
City | Rome |
Period | 24/02/24 → 26/02/24 |
Internet address |
Keywords
- GANs
- Generative AI
- Tabular Data Representation
- Variational Autoencoders