XPCA Gen: Extended PCA Based Tabular Data Generation Model

Sreekala Kallidil Padinjarekkara, Jessica Alecci, Mirela Popa

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

The proposed method XPCA Gen, introduces a novel approach for synthetic tabular data generation by utilising relevant patterns present in the data. This is performed using principle components obtained through XPCA (probabilistic interpretation of standard PCA) decomposition of original data. Since new data points are obtained by synthesizing the principle components, the generated data is an accurate and noise redundant representation of original data with a good diversity of data points. The experimental results obtained on benchmark datasets (e.g. CMC, PID) demonstrate performance in ML utility metrics (accuracy, precision, recall), showing its ability to capture inherent patterns in the dataset. Along with ML utility metrics, high Hausdorff distance indicates diversity in generated data without compromising statistical properties. Moreover, this is not a data hungry method like other complex neural networks. Overall, XPCA Gen emerges as a promising solution for data privacy preservation and robust model training with diverse samples.
Original languageEnglish
Title of host publicationProceedings of the 13th International Conference on Pattern Recognition Applications and Methods
EditorsModesto Castrillon-Santana, Maria De Marsico, Ana Fred
PublisherScience and Technology Publications, Lda
Pages141-151
Number of pages11
Volume1
ISBN (Print)9789897586842
DOIs
Publication statusPublished - 2024
Event13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024 - Rome, Italy
Duration: 24 Feb 202426 Feb 2024
Conference number: 13
https://icpram.scitevents.org/NeroPRAI.aspx?y=2024

Conference

Conference13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024
Abbreviated titleICPRAM 2024
Country/TerritoryItaly
CityRome
Period24/02/2426/02/24
Internet address

Keywords

  • ML Utility
  • Privacy Preservation
  • Tabular Data Generation
  • XPCA Decomposition

Fingerprint

Dive into the research topics of 'XPCA Gen: Extended PCA Based Tabular Data Generation Model'. Together they form a unique fingerprint.

Cite this