AWS Database Blog

Automate Amazon RDS for PostgreSQL horizontal scaling and system integration with Amazon EventBridge and AWS Lambda

You may have a workload where you want to automate scaling, such as a reporting application with unpredictable increases in queries, or an application with database utilization increasing at predictable times like end-of-month reporting. Scaling a database to appropriately handle workload demand is important to help manage cost, operations, performance, security, and reliability. With Amazon Relational Database Service (Amazon RDS) for PostgreSQL, you can scale a database instance vertically or horizontally. You implement vertical scaling by changing the DB instance type or size (for example, from M to R or from xlarge to 2xlarge), and scale horizontally by creating read replicas.

Notification of the scaling event gives you the opportunity to automate system integration. One example is notifying your cost management system about the scaling event in order to provide near real-time cost metrics. Another example is to notify your application, and dynamically adjust to use the read replica for queries. A third example is providing metrics to your data warehouse when the scaling event occurs.

In this post, we provide a solution to automate horizontal scaling, and create a mechanism to automate system integration.

Overview of solution

The solution provides horizontal scaling through an event-driven architecture by monitoring your RDS for PostgreSQL database instance, and automating read replica creation based on database workload metrics. The solution creates one read replica when triggered, and the maximum number of read replicas is configurable. You can create up to five read replicas from one source DB instance. As of RDS for PostgreSQL 14.1, you can also create up to three levels of read replica in a chain (cascade) from a source DB instance. For simplicity, this solution limits the number of read replicas to five.

Your application is notified when a read replica is created. Logic built into your application offloads queries to the read replicas. In this solution, Amazon CloudWatch provides monitoring based on database CPU, and Amazon EventBridge watches for a CloudWatch alarm and routes the event to an AWS Lambda function. The Lambda function creates a read replica and notifies your application with Amazon Simple Notification Service (Amazon SNS). You can add code in the Lambda function to automate system integration. When the application is made aware of the read replica, you may offload read queries to the new instance.

The following diagram illustrates the solution architecture.

Architecture

This solution provides the following benefits:

  • You can create RDS for PostgreSQL read replicas based on a configurable CPU threshold.
  • You can add code to automate system integration in the AWS Lambda function.
  • You can notify the application code with Amazon SNS when a read replica is created.
  • You have a configurable maximum number of read replicas.
  • You have a configurable debug level.

Prerequisites

For this solution, the following prerequisites are required:

Deploy the solution

You can deploy the solution by using the CloudFormation template provided as part of this blog post. In this solution, the resources we create in your account are:

  • AWS Lambda function
  • Amazon CloudWatch Logs log group
  • Amazon CloudWatch alarm
  • Amazon EventBridge rule
  • AWS Key Management Service (AWS KMS) customer managed key for SNS
  • AWS Key Management Service (AWS KMS) customer managed key for CloudWatch
  • Amazon SNS topic

Click the Launch Stack button to deploy the CloudFormation template in the us-east-1 region.

Launch Stack

Alternatively, you can manually create the stack:

  1. On the AWS CloudFormation console, choose Create Stack.
  2. On the Create stack screen, in the Amazon S3 URL enter:
    https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/DBBLOG-1871/database_DBBLOG-1871_rds-scale-horizontal.yaml

The template requires application control, threshold, and Lambda configuration input parameters.

The application control configuration is as follows:

  • Deployment ID – The deployment ID provides unique resource names.
  • RDS identifier – The RDS for PostgreSQL database identifier for the database you want to monitor.
  • Maximum number of read replicas – The maximum number of read replicas allowed. An exception is thrown when the limit is exceeded.
    application control configuration

The threshold configuration is as follows:

  • CPU threshold – The solution creates a read replica when your database CPU is greater than or equal to this number, and the evaluation period number of datapoints is met. A read replica is not created when the maximum number of read replicas defined by MaxNumReadRelicaParameter is reached.
  • CloudWatch period The period, in seconds, over which the CPU threshold is applied. Valid values are 10, 30, 60, and any multiple of 60.
  • CloudWatch evaluation periods The number of periods over which data is compared to the CPU threshold.
  • CloudWatch datapoints to alarm The number of data points that must be breaching to trigger the alarm.
    threshold configuration

The Lambda configuration is as follows:

  • Lambda execution role – The Lambda execution role for the Lambda function we create. Use the Amazon Resource Name (ARN) of the role you created as a prerequisite.
  • Memory size – The amount of memory available to the function at runtime. Increasing the function memory also increases its CPU allocation. The default value is 128 MB. The value can be any multiple of 1 MB up to 10240.
  • Lambda timeout – The amount of time (in seconds) Lambda allows the function to run before stopping it. The maximum allowed value is 900 seconds.
  • Security groups – The security groups for the Lambda function.
  • Subnets – The subnets to deploy the Lambda function in.
  • Python debug level – The debug level for Python logger.
    Lambda configuration

In the following section, we walk through the high-level components of the solution and how to monitor their progress.

Monitor Amazon RDS database CPU utilization via CloudWatch

To view the CloudWatch alarm, complete the following steps:

  1. On the CloudWatch console, choose All Alarms in the navigation pane.
  2. Choose the CloudWatch alarm.
  3. Review the values for Threshold and DBInstanceIdentifier in the details pane.
    CloudWatch alarm

The alarm state changes from OK to ALARM when the threshold condition is met. For demonstration purposes, the alarm uses an evaluation period of 15 minutes. Adjust the values to meet your needs.

Monitor the CloudWatch alarm and invoke a Lambda function via EventBridge

To view the EventBridge rule, complete the following steps:

  1. On the EventBridge console, choose Rules in the navigation pane.
  2. Choose the rule.
  3. Review the information under Event pattern and Target.
    EventBridge rule

Your EventBridge rule is configured to run the Lambda function when the CloudWatch alarm state changes.

Create an Amazon RDS read replica via Lambda

To view how the Lambd­a function creates a read replica, complete the following steps:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose the function.
  3. In the Code source editor, review the lambda_handler Python function.
    lambda_handler
    The lambda_handler python function checks the alarm state and invokes create_rds_read_replica if the status is ALARM.
  4. In the Code source editor, review the create_rds_read_replica Python function.
    create_rds_read_replica

The create_rds_read_replica Python function creates a read replica if the number of read replicas is less than your defined maximum number of replicas. The describe_db_instances Amazon RDS API function is used to determine the current number of read replicas for your database instance.

Send events to an SNS topic via Lambda

To review how the function sends events to Amazon SNS, complete the following steps:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose the function.
  3. In the Code source editor, scroll to the notify_application Python function.
    notify_application

An event is sent to your application when the read replica is created and when the read replica is available via the SNS topic created by the CloudFormation template.

Subscribe to the SNS topic

You must write application code to subscribe to the SNS topic, and offload read-only queries to the read replicas. The event includes the read replica instance status and connection endpoint. Your application should check for DBInstanceStatus status available. Use the connection endpoint to connect and perform read-only queries. The following is a sample event including the DBInstanceStatus status available:

{
    "DBInstances": [{"DBInstanceIdentifier": "poc-read-replica-3",
    "DBInstanceClass": "db.t3.small",
    "Engine": "postgres",
    "DBInstanceStatus": "available",
    "MasterUsername": "admindba",
    "DBName": "rdspoc",
    "Endpoint": {"Address": "frbpoc-read-replica-3.XXXXXXXXX.us-east-1.rds.amazonaws.com", "Port": 5432, …
}

You can use this code sample for connecting and querying a PostgreSQL database.
The walk-through is complete; you achieved horizontal scaling by adding read replicas when the CPU threshold exceeds the number of configured data points breached within the defined CloudWatch alarm evaluation period.

System Integration

Modifying the solution to automate system integration requires you to add code in the Lambda function. Within the Lambda source code, the notify_application Python function is a hook where you can automate system integration. The code provided in this blog publishes an Amazon SNS event. You can add code to communicate with systems such as calling an API to manage workflows, or communicate with a software as a service (SaaS) solution.

Test scalability

You can use a tool such as pgbench to generate load on your database to test scalability. For an example to generate load, refer to Automate benchmark tests for Amazon Aurora PostgreSQL.

Monitor your database

A variety of tools are available to monitor your database, including Amazon RDS Performance Insights, Amazon RDS Enhanced Monitoring, Amazon RDS database logs, and Amazon CloudWatch Logs. You can use these tools to help determine the threshold when you want to trigger a scaling event.

Clean up

To avoid incurring future charges, delete the resources you created as part of this post. You can clean up the AWS resources (Lambda function, CloudWatch alarm, EventBridge rule, KMS customer managed keys, and SNS topic) by deleting the CloudFormation stack.

Conclusion

In this post, we provided an automated solution to horizontally scale Amazon RDS for PostgreSQL using an event-driven architecture. CloudWatch monitors your RDS database instance CPU. EventBridge watches for a CloudWatch alarm, and routes the event to a Lambda function. The function creates a read replica and notifies your application via Amazon SNS. You can add code in the Lambda function to automate system integration.

When scaling your database, it’s important to optimize cost and follow best practices. As you look at options to scale your RDS database instance, I encourage you to experiment scaling vertically and horizontally. The example in this post uses CPU to determine when a read replica is created. You can create a composite alarm using multiple metrics such as CPU and memory to refine the read replica creation criteria. You can also set alarms from Performance Insights metrics. Another consideration is to scale down by deleting the read replicas when they’re no longer needed. Close open connections to the read replica before deleting the read replica. You can create a similar solution as described in this post with a CloudWatch alarm based on a CPU threshold to scale down.


About the Author

Andrew Love is a Sr. Solutions Architect in the Worldwide Public Sector at AWS. He is passionate about helping customers build well-architected solutions to achieve their business needs. He enjoys spending time with his family, a good game of chess, home improvement projects, and writing code.