🗂️ Navigation

Synthetic Data Vault (SDV)

An open-source library for generating synthetic data for various data types.

Visit Website →

Overview

The Synthetic Data Vault (SDV) is an open-source project that provides a collection of tools for generating synthetic data for various data modalities, including single tables, relational databases, and time series. It offers a variety of models and evaluation metrics to help users create high-quality synthetic data.

✨ Key Features

  • Open-source
  • Support for single table, relational, and time-series data
  • Multiple generative models (e.g., Gaussian Copula, CTGAN, TVAE)
  • Data quality and utility evaluation
  • Extensible and customizable

🎯 Key Differentiators

  • Support for multiple data modalities (single table, relational, time-series)
  • Wide range of generative models
  • Comprehensive evaluation framework

Unique Value: The Synthetic Data Vault provides a powerful and flexible open-source solution for generating synthetic data for a variety of data types and use cases.

🎯 Use Cases (5)

Data augmentation Data sharing and collaboration Software testing Machine learning model development Academic research

💡 Check With Vendor

Verify these considerations match your specific requirements:

  • Very large datasets that do not fit in memory

🏆 Alternatives

DataSynthesizer Gretel (open-source components) MOSTLY AI (open-source components)

SDV's support for relational and time-series data, along with its extensive library of models and evaluation tools, makes it one of the most comprehensive open-source synthetic data libraries available.

💻 Platforms

Desktop

✅ Offline Mode Available

🔌 Integrations

Python Pandas Scikit-learn

💰 Pricing

Contact for pricing
Free Tier Available

Free tier: N/A (Open-source)

Visit Synthetic Data Vault (SDV) Website →