Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data

Syed Mahir Tazwar, Max Knobbout, Enrique Hortal Quesada, Mirela Popa

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

1 Downloads (Pure)

Abstract

Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-theart models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperforms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.
Original languageEnglish
Title of host publicationProceedings of the 13th International Conference on Pattern Recognition Applications and Methods
EditorsModesto Castrillon-Santana, Maria De Marsico, Ana Fred
PublisherScience and Technology Publications, Lda
Pages17-26
Number of pages10
Volume1
ISBN (Print)9789897586842
DOIs
Publication statusPublished - 2024
Event13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024 - Rome, Italy
Duration: 24 Feb 202426 Feb 2024
Conference number: 13
https://icpram.scitevents.org/NeroPRAI.aspx?y=2024

Conference

Conference13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024
Abbreviated titleICPRAM 2024
Country/TerritoryItaly
CityRome
Period24/02/2426/02/24
Internet address

Keywords

  • GANs
  • Generative AI
  • Tabular Data Representation
  • Variational Autoencoders

Fingerprint

Dive into the research topics of 'Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data'. Together they form a unique fingerprint.

Cite this