How can I generate synthetic data for training a VAE model on imbalanced datasets specifically for anomaly detection

Question

Can you tell me how I can generate synthetic data to train a VAE model on imbalanced datasets, specifically for anomaly detection?

score 0 · Answer 1 · Dec 10, 2024

To generate synthetic data for training a VAE model on imbalanced datasets for anomaly detection, you can create a dataset with a majority of normal samples and a small fraction of anomalous samples. Here is the code you refer to:

In the above code, we are using:

Normal Data: Generate majority-class samples using a Gaussian distribution.
Anomalous Data: Create minority-class samples with a different range or distribution.
Combine Data: Merge normal and abnormal data with corresponding labels.
Prepare Dataset: Use tf.data.Dataset for batching and shuffling.

Hence, this setup provides an imbalanced dataset ideal for training a VAE to reconstruct normal data and detect anomalies.