is a guided CLI wizard that generates synthetic tabular data without requiring any ML knowledge or coding. Built using open-source package SDV, it creates high-quality, realistic data sets for testing, development, and research. The user-friendly interface with guided prompts makes it easy for users to generate the exact data they need, all while ensuring privacy and security.
- Download binary
- Upload it to AWS CloudShell
- Run the following command:
./synth_table
- Follow the prompts to create your synthetic data
- Clone the repository
- Compile the code by running
cargo run
(this may take some time). - Follow the prompts to generate your synthetic data.
- Your source table should be stored in an S3 bucket and cataloged using AWS Glue.
- The region where your data is located must have at least one VPC with at least one private subnet.
- The user running the script should be granted temporary AdminitratorAcces permission set to streamline the process.
- Choose the AWS Glue database where your table is located.
- Choose the table for which you want to generate synthetic data. Only tables on S3 will appear in the list.
- Select an Amazon VPC in the same AWS Region as the table data that has at least one private subnet.
- The process will select a subnet and launch an EC2 instance with minimum required privileges to generate the data.
- You will be updated on the progress throughout the process.
- Once the data is generated, it will be cataloged in the AWS Glue catalog in the same database and with the same prefix as the source data, but with the "_synthetic" prefix.
- You can use AWS Athena to view the data.
- The instance will be terminated after the process is completed and log data will be available in AWS CloudWatch logs under log-group - SynthTable log-stream - table_name. The model used to generate the synthetic tabular data will be destroyed at the time of instance termination.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
SDV is licensed under Business Source License 1.1.