Synchronize your Oracle databases quickly and easily with Amazon FSx for OpenZFS

Update 4/8/2024: You can find a more recent version of this solution in the blog “Accelerate development refresh cycles and optimize cost with Amazon FSx for NetApp ONTAP.” That post presents an alternative solution to the same use case found in this post, but using Amazon FSx for NetApp ONTAP. This post, which uses Amazon FSx for OpenZFS, will remain as a reference.

Synchronizing a fleet of Oracle database environments can become a major operational burden for organizations of all sizes. In this blog post, I explain how you can reduce operational complexity, improve agility, and reduce costs by using Amazon FSx for OpenZFS to clone your Oracle Database data across multiple environments in seconds.

IT operations teams today face the challenging task of managing an ever-increasing quantity of data while maintaining the highest levels of availability, resiliency, and performance. These challenges are amplified at organizations with enterprise resource planning (ERP) and customer relationship management (CRM) application suites. These systems typically involve very large database footprints distributed across development, testing, staging, and other environments.

To keep this fleet of environments up-to-date, database administrators (DBAs) need to manually synchronize data and code from their production database on an ongoing basis. Even with native tools like Recovery Manager (RMAN) or Data Pump, this process can be time-consuming, error-prone, and inefficient, impairing productivity and reducing business agility. In the worst case, DBAs need to provision additional capacity to store multiple distinct copies of the data, increasing costs by as much as 20 times for large distributed deployments.

With Amazon EC2 and Amazon FSx for OpenZFS, you can clone near petabyte-scale Oracle databases in seconds without consuming any additional capacity, increasing refresh speed and reducing resource consumption across your distributed environments. Powered by OpenZFS space-efficient copy-on-write technology, this capability accelerates database synchronization from hours or days with traditional database deduplication to just a few seconds with FSx for OpenZFS cloning.

In this post, I show you how to combine the power of Amazon FSx for OpenZFS with Oracle Database running on Amazon EC2 to accelerate your database cloning process and deliver a cost-optimized storage foundation for your database fleet.

Solution overview

Amazon FSx for OpenZFS is a fully managed shared file storage service built on the OpenZFS file system, powered by the AWS Graviton family of processors, and accessible via the NFS protocol. It is the latest addition to the Amazon FSx family, which delivers feature-rich, secure, and performant file storage powered by popular file systems like NetApp ONTAP, Windows File Server, and Lustre.

There are several ways that you can use Amazon FSx file systems in your Oracle Database deployments. You can use Amazon FSx to provide highly performant and cost-effective shared storage in a highly available Oracle Real Application Cluster (RAC) deployment. You can also attach FSx file systems to individual Oracle Database hosts (as I do in this blog) to take advantage of the fully managed scalability and rich data management capabilities that they offer.

To take advantage of the rapid cloning capabilities of Amazon FSx for OpenZFS, you need to store the source data on a volume in an FSx for OpenZFS file system. The FSx for OpenZFS features that make space-efficient cloning possible are snapshots and clone volumes.

A snapshot is a read-only image of an FSx for OpenZFS volume at a point in time. They are stored alongside your ﬁle system’s data and consume the ﬁle system’s storage capacity. A clone volume is a writable copy that is initialized using a snapshot and initially contains identical data to the snapshot from which it was created. Multiple clones can be created from the same snapshot; they are created instantly and consume no storage capacity until changes are made to those clone volumes. Combined, these capabilities enable you to synchronize multiple non-production environments from your production database in seconds to a consistent point-in-time and powered by the same underlying data.

The recommended architecture replicates the source data from a Data Guard standby database (which can be synchronized from production or from a preexisting standby) in order to avoid affecting the performance of your production database. For simplicity and efficiency, the downstream non-production databases are configured to run on the same EC2 instance as the Data Guard standby source database. The Data Guard standby database and FSx for OpenZFS file system are located in the same AWS Availability Zone for optimum performance.

Oracle OpenZFS Architecture Diagram

Walkthrough

The following is a detailed walkthrough of the steps required to build an environment like the one illustrated in the previous architecture diagram. By stepping through this procedure, you can create a foundation for a solution that leverages the power of Amazon FSx for OpenZFS to optimize and accelerate your Oracle Database cloning procedures. Later in the blog post, I cover some of the finer details to help you build an even more robust and extensible solution.

Prerequisites

To create a similar deployment in your own environment, you need to meet the following prerequisites:

An AWS account with access to one of the currently supported AWS Regions for FSx for OpenZFS
Oracle 12cR2 Database installed onto 2 Amazon EC2 instances, one in each Availability Zone (see preceding diagram).
- An Oracle 12cR2 Primary Database running on the first EC2 instance.
- An Oracle 12cR2 Standby Database running on the second EC2 instance.
Intermediate-level knowledge of Oracle Database cloning procedures and tools (Data Guard, sqlplus)

Implementation

1. Provision an FSx for OpenZFS file system in the same Availability Zone and subnet as your standby EC2 instance created as part of the prerequisites.

The file system will take a few minutes to provision. Once it is complete, you can view its attributes from the FSx console:

OraZFS Summary

2. Verify that the security group associated with the new FSx for OpenZFS file system allows NFS traffic from the security group associated with the Oracle standby database. You can obtain the security group ID associated with your standby database by selecting its EC2 instance ID from the EC2 console, opening the Instance details page, and then selecting the Security tab.

3. Mount the new OpenZFS file system on the EC2 instance that hosts the standby database. You can obtain the NFS mount command by selecting your volume in the FSx for OpenZFS console and selecting Attach at the top right:

Attach volumes

The mount options displayed are appropriate for general purpose file access via NFS, but Oracle recommends some specific mount options that differ depending on the database version and file type. Please check My Oracle Support Notes 1567137.1 and 359515.1 for explicit NFS mount options required for Oracle data files.Use the mount options recommended in the FSx for OpenZFS volume Attach modal and the My Oracle Support Notes to mount the file system onto the standby database EC2 instance.

4. Create a new Data Guard standby database from the primary database and store its data files on the mounted OpenZFS file system (or cascade it from the existing standby). The most efficient way to create a new ZFS-based standby database to act as the source for downstream database clones is to use the Oracle RMAN Duplicate from Active Database technique. This allows you to copy the live primary database to a standby EC2 instance and write the data files to the FSx for OpenZFS volume with a single step. The key callout for this use case is to ensure that you convert the data file locations to the FSx for OpenZFS file system in the RMAN Duplicate script by using the db_file_name_convert SPFILE parameter.

5. Confirm that the primary database and standby database using ZFS volumes are in sync. This can be verified by querying the Data Guard view V$DATAGUARD_PROCESS using SQLPLUS.

6. For each non-production environment you want to clone:

- Temporarily stop Data Guard managed recovery on the source ZFS-based standby database (SQLPLUS> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL).
- Take a snapshot of the source FSx for OpenZFS volume using the AWS FSx console.
- Re-enable Data Guard managed recovery on the source standby (SQLPLUS> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT).
- Create an FSx for OpenZFS clone volume using the snapshot taken in Step 2:

Create an FSx for OpenZFS Clone Volume

- Mount the new clone volume from Step 4 on the standby EC2 instance (using a unique mount path). The clone volume created in the previous step will be dedicated to the downstream non-production environment you are cloning. So this needs to be mounted onto a mount point specific to the target environment (development, testing, staging). Choose a meaningful name for the mount path on the standby EC2 instance and mount the clone volume using the DNS name of the clone volume from the FSx for OpenZFS console and the same NFS mount parameters used in Step 4 above.
- Rename the database in the new clone volume to the name of the non-production target environment (Dev, Test, Staging). This final step is required to rename the database stored in the clone volume to match the target environment name (for example, development, testing, staging, and so on). This is because the clone volume initially contains an identical copy of the standby data files as they appear on the source (production). This can be achieved by renaming subdirectories in the FSx for OpenZFS file system stored on the clone volume using mv from the Linux shell, and then recreating the database controlfile using CREATE CONTROLFILE SET DATABASE "<Dev|Test|Staging>" RESETLOGS.

Building a robust solution

The steps above will help you lay the foundation for space-efficient database cloning. This section will introduce some additional considerations that will allow you to build on top of that foundation to create a robust and easy to maintain solution.

Clone source

To take advantage of the space-efficient ZFS snapshot and clone technology, the source Oracle database must store its data files on an FSx for OpenZFS volume mounted over NFS. Customers commonly need to refresh their non-production databases from a recent copy of production to capture all of the latest data and code changes and provide a good baseline for testing and staging new products, features, and integrations. Therefore a natural choice is to select the production primary database as the source. Many customers have already deployed Data Guard standby databases for disaster recovery protection and to minimize maintenance impacts on the live system by offloading backups and reporting. In this case, the recommended approach is to use a dedicated Data Guard standby database as the source of all the downstream non-production databases that need to be cloned. This can either be synchronized directly with the primary or cascaded from a preexisting standby. The extra benefit of sourcing from a standby is that the database can be quiesced by temporarily disabling redo apply to provide a consistent point-in-time snapshot that acts as the baseline for cloned volumes. This also simplifies the steps required to rename the cloned database during database refreshes.

ZFS snapshot and clone volume management

The combination of snapshots and clone volumes delivers an optimized data cloning foundation. Creating multiple snapshots and clone volumes from a single source lets you create rapid, space-efficient copies of large datasets with minimal processing overhead and impact. The magic of copy-on-write means that the storage consumption of the snapshot is a function of the rate of change rather than simply a multiple of the source database size.

To achieve high levels of efficiency and speed, clone volumes retain a dependency on the source snapshot, meaning the source snapshot cannot be deleted while any dependent clone volumes are in use. If the non-production database has a long lifetime and undergoes significant changes, you can choose to split its clone volume from its source snapshot and remove the dependency. To do this, you must create a new full-copy volume from that clone, which is an independent writable copy. Full-copy volumes are fully independent and include all of the data from the source snapshot.

The storage capacity consumption will start to increase as changes are made to both the source volume and clone volume(s). For that reason, you should closely monitor the file system’s storage consumption. It’s also a good idea to regularly delete unused snapshots and clone volumes to free up capacity.

Oracle Database rename

The database rename at the end of the cloning process is a destructive operation because it recreates the database control file. If the non-production database is running on the same EC2 instance as the source standby database, then extra care needs to be taken to ensure you are only making changes to the file system on the clone volume and not the file system on the source volume. To introduce extra protection, you can choose to split your standby database source and downstream non-production targets onto separate EC2 instances. The choice to do this will depend on your Oracle license entitlements.

Cleaning up

Once you have finished using the AWS resources deployed to support this walkthrough, you can delete them to avoid incurring any further charges.

If you used the Oracle Database AWS QuickStart to deploy the Oracle-related infrastructure resources, you must first delete any FSx for OpenZFS-related resources that were created inside the VPC created by the QuickStart CloudFormation template. If you instead created the FSx for OpenZFS components in a completely different VPC, then you can preserve the FSx for OpenZFS resources or delete them after deleting the Oracle Database QuickStart resources.

To delete the Oracle Database QuickStart resources, delete the associated CloudFormation stack in the Region it was originally deployed to.

Conclusion

In this blog, I have walked through creating a foundation for a solution that leverages the power of Amazon FSx for OpenZFS to optimize and accelerate Oracle Database cloning procedures and ways to help you build an even more robust and extensible solution. By leveraging the power of space-efficient snapshots and clone volumes available in FSx for OpenZFS, operations teams can mitigate the burden of regularly cloning large numbers of downstream databases from production Oracle databases and satisfy the demands of multiple competing database consumers while keeping storage costs to a minimum.

FSx for OpenZFS is just one option available in the Amazon FSx family of file storage offerings. Customers running Oracle databases can also store their data files on FSx for Netapp ONTAP to gain access to thin provisioning, replication (SnapMirror), and point-in-time cloning (FlexClone). For example, you can also achieve the same space-efficient, rapid cloning with FSx for NetApp ONTAP FlexClones.

While this blog post focused on rapid Oracle database cloning, there are many alternative use cases that could benefit from tapping into the power of FSx for OpenZFS to achieve efficient storage management. Here are a few suggestions for future potential uses of this managed service offering:

Combine FSx for OpenZFS with Oracle Multitenant pluggable database cloning
Rapid SQL Server database cloning using FSx for OpenZFS
Oracle E-Business Suite application tier cloning