DEV Community

Romaric P.
Romaric P.

Posted on

How to seed your dev Postgres DB with your prod DB with RepliByte

Intro

In my company, we build a platform for developers to help them deploying easily their apps on AWS. One major feature that we have is the Preview Environment - which let any developer to create a full replica environment from the production for every pull request. It's convenient and we had to find a way to clone the apps and the databases with the data included. That's why I created RepliByte - an open-source tool written in Rust to synchronize cloud databases and hide sensitive data ๐Ÿ”ฅ

Backup your prod Postgres DB into S3

source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

To run the backup

replibyte -c prod-conf.yaml backup run
Enter fullscreen mode Exit fullscreen mode

To list your backups

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
Enter fullscreen mode Exit fullscreen mode

Clean sensitive data

RepliByte provides the Transformers to clean up the sensitive data from your database.

# Transformers

Here is a list of all the transformers available.

| id              | description                                                                                        | available |
| --------------- | -------------------------------------------------------------------------------------------------- | --------- |
| transient       | Does not modify the value                                                                          | yes       |
| random          | Randomize value but keep the same length (string only). [AAA]->[BBB]                               | yes       |
| first-name      | Replace the string value by a first name                                                           | yes       |
| email           | Replace the string value by an email address                                                       | yes       |
| keep-first-char | Keep only the first char for strings and digit for numbers                                         | yes       |
| phone-number    | Replace the string value by a phone number                                                         | yes       |
| credit-card     | Replace the string value by a credit card number                                                   | yes       |
| redacted        | Obfuscate your sensitive data (>3 characters strings only). [4242 4242 4242 4242]->[424**********] | yes       |
Enter fullscreen mode Exit fullscreen mode

To use the Transformers, you need to edit your configuration file and add them:

source:
  connection_uri: $DATABASE_URL
  encryption_key: $MY_PRIVATE_ENC_KEY # optional 
  transformers:
    - database: public
      table: employees
      columns:
        - name: last_name
          transformer_name: random
        - name: birth_date
          transformer_name: random-date
        - name: first_name
          transformer_name: first-name
        - name: email
          transformer_name: email
        - name: username
          transformer_name: keep-first-char
    - database: public
      table: customers
      columns:
        - name: phone
          transformer_name: phone-number
bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

Then your sensitive data will be hidden while seeding your dev Postgres DB ๐Ÿ‘Œ

Seed your dev Postgres DB

To restore a backup, you first need to declare a destination in your YAML config file.

bridge:
  bucket: $BUCKET_NAME
  access_key_id: $ACCESS_KEY_ID
  secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $DATABASE_URL
  decryption_key: $MY_PUBLIC_DEC_KEY # optional
Enter fullscreen mode Exit fullscreen mode

Then, you need to run replibyte backup list to list all the backup available

replibyte -c prod-conf.yaml backup list

type          name                    size    when                    compressed  encrypted
PostgreSQL    backup-1647706359405    154MB   Yesterday at 03:00 am   true        true
PostgreSQL    backup-1647731334517    152MB   2 days ago at 03:00 am  true        true
PostgreSQL    backup-1647734369306    149MB   3 days ago at 03:00 am  true        true
Enter fullscreen mode Exit fullscreen mode

and replibyte restore to seed your dev database

replibyte -c prod-conf.yaml restore -v latest

OR 

replibyte -c prod-conf.yaml restore -v backup-1647706359405
Enter fullscreen mode Exit fullscreen mode

What else?

  • RepliByte is written in Rust and all operations are made on the fly. Meaning no extra disk space is consumed and there is no data leak risk. โšก๏ธ

  • RepliByte also supports MongoDB (Thanks to Benny - contributor) ๐Ÿ”ฅ

  • Complete data synchronization ๐Ÿ’ช๐Ÿผ

  • Work on different any cloud providers ๐ŸŒ

  • You can use multiple transformers to hide your sensitive data ๐Ÿ™ˆ

  • Designed to backup TB of data ๐Ÿ†

  • Skip data sync for specific tables ๐Ÿ‘Œ

  • On-the-fly data (de)compression (Zlib) and de/encryption (AES-256)๐Ÿ›ก

Conclusion

RepliByte is a command line tool that makes database seeding super easy and convenient. I am working on a way to restore a database locally with Docker in one command. More is coming so stay tuned and feel free to share your feedback.

RepliByte GitHub: https://github.com/Qovery/replibyte

Top comments (0)