Automate benchmark tests for Amazon Aurora PostgreSQL

Optimizing a database is an important activity for new and existing application workloads. You need to take cost, operations, performance, security, and reliability into consideration. Conducting benchmark tests help with these considerations. With Amazon Aurora PostgreSQL-Compatible Edition, you can run multiple benchmark tests with different transaction characteristics matching your data access patterns. In this post, we provide a solution to automate and scale benchmark tests. The solution supports running multiple workloads utilizing multiple client instances, which gives you the ability to create realistic benchmarks.

Overview of solution

The solution provides the ability to run a benchmark test across Amazon Elastic Compute Cloud (Amazon EC2) instances, and multiple benchmarks may run concurrently. For instance, one benchmark performs select, update, insert, and delete on an Aurora writer node. At the same time, another benchmark performs select statements on an Aurora reader node. The following diagram illustrates this architecture.

To create the automated solution, we configure pgbench as the benchmark tool, an EC2 launch template scales the number of instances, and AWS Systems Manager Run Command performs the benchmark test on each instance. To implement this solution, we complete the following high-level steps:

Create schema objects.
Configure an EC2 instance.
Create an EC2 Amazon Machine Image (AMI).
Create an EC2 launch template.
Launch two groups of EC2 instances: one for write activity and another for read activity.
Run benchmark tests using Systems Manager Run Command.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Amazon Virtual Private Cloud (Amazon VPC) with three private subnets
An Aurora PostgreSQL DB cluster version 12.4 or later
An AWS Identity and Access Management (IAM) instance profile for Systems Manager
A launched EC2 Linux instance
psql client with connectivity to your Aurora cluster
An Amazon Simple Storage Service (Amazon S3) bucket
A Gateway VPC endpoint to privately access Amazon S3
Familiarity with Aurora, Amazon VPC, Amazon EC2, Amazon S3, and IAM

License for code samples

// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: MIT-0

Database schema

To demonstrate how to customize the benchmark test to your schema and data access pattern, we use three tables and three sequences instead of the default pgbench TPC-B benchmark.

Because one of the prerequisites is psql, start with connecting to your Aurora cluster. To connect to your Aurora PostgreSQL database with psql in Amazon Linux, use the following format:

psql \
   --host=<DB instance endpoint> \
   --port=<port> \
   --username=<master username> \
   --password \
   --dbname=<database name>

The following code provides the sample database schema used in this post. Create these objects in the Aurora PostgreSQL database you plan to use for the benchmark test.

CREATE SEQUENCE if not exists order_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists order_item_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists order_payment_seq
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE TABLE customer_order (
order_id bigint NOT NULL,
order_description varchar(256) NOT NULL,
order_date timestamp(0) without time zone NOT NULL,
CONSTRAINT order_pkey PRIMARY KEY (order_id)
);

CREATE TABLE order_item (
order_id bigint NOT NULL references customer_order (order_id),
order_item_id bigint NOT NULL DEFAULT nextval('order_item_seq'),
item_description varchar(2048) NOT NULL,
quantity integer NOT NULL,
price numeric(12,2) NOT NULL,
CONSTRAINT order_item_pkey PRIMARY KEY (order_id,order_item_id)
);

CREATE TABLE order_payment (
order_id bigint NOT NULL references customer_order (order_id),
payment_id bigint NOT NULL DEFAULT nextval('order_payment_seq'),
amount numeric(12,2) NOT NULL ,
payment_date timestamp(0) without time zone,
CONSTRAINT order_payment_pkey PRIMARY KEY (order_id,payment_id)
);

Configure your EC2 instance

Log in to the EC2 instance in your account. We use PostgreSQL 12 for this example, so we need to configure the PostgreSQL client. This requires the amazon-linux-extras repository installed on your instance. If the amazon-linux-extras package isn’t installed, use yum to install it.

Enter the following code:

$ which amazon-linux-extras
$ sudo yum install -y amazon-linux-extras
$ amazon-linux-extras | grep postgresql
5 postgresql9.6 available \
6 postgresql10 available [ =10 =stable ]
41 postgresql11 available [ =11 =stable ]
58 postgresql12=latest enabled [ =stable ]
59 postgresql13 available [ =stable ]
$ sudo amazon-linux-extras enable postgresql12
Now you can install:
# yum clean metadata
# yum install postgresql
$ sudo yum clean metadata
$ sudo yum install postgresql

Create a file named benchmark.sh in /home/ec2-user/benchmark/:

$ mkdir /home/ec2-user/benchmark
$ cd /home/ec2-user/benchmark
$ touch benchmark.sh

Add the following code to benchmark.sh:

#!/bin/bash

# SYNTAX FOR RUNNING FOR A SPECIFIC AMOUNT OF TIME IN SECONDS
pgbench -c $BENCHMARK_CONNECTIONS -j $BENCHMARK_THREADS -T $BENCHMARK_TIME -U $BENCHMARK_USER -d $BENCHMARK_DB -h $BENCHMARK_HOST -p $BENCHMARK_PORT -f $BENCHMARK_SQL_FILE

Make the file executable:
```
$ chmod +x benchmark.sh
```

Create a file named transaction_group_1.sql, which is used by pgbench to generate write traffic to the writer node:

\set order_item_quantity random(1,100)
\set order_item_price double(random(1,1000))
\set order_payment double(random(1, :order_item_price))
\set order_update_id random(1,10000000)
\set order_delete_id random(1,10000000)

BEGIN;

-- get a unique number for customer order
SELECT nextval('order_seq') \gset p_

-- insert customer_order
INSERT INTO customer_order(order_id, order_description, order_date) VALUES (:p_nextval, concat('description for order',:p_nextval), CURRENT_TIMESTAMP);

-- insert order item
INSERT INTO order_item(order_id, item_description, quantity, price) VALUES (:p_nextval, concat('item count is ',:order_item_quantity) ,:order_item_quantity, :order_item_price);

-- insert order payment
INSERT INTO order_payment(order_id, amount, payment_date) VALUES (:p_nextval, :order_payment, CURRENT_TIMESTAMP);

-- update random order item
UPDATE order_item set item_description = concat(item_description,' random update') WHERE order_id = :order_update_id;

-- delete random order item
DELETE FROM order_item WHERE order_id = :order_delete_id;

END;

Create a file named transaction_group_2.sql, which is used by pgbench to generate read traffic to the reader node:

\set order_id random(1,10000000) 

BEGIN; 
-- select random orders 
SELECT SUM(order_item.PRICE * order_item.quantity) FROM customer_order , order_item WHERE customer_order.order_id = :order_id AND customer_order.order_id = order_item.order_id; 

--select random order payment 
SELECT customer_order.order_id, order_item.item_description, order_item.quantity, order_payment.amount FROM customer_order , order_item , order_payment WHERE customer_order.order_id = :order_id AND order_item.order_id = customer_order.order_id AND order_payment.order_id = customer_order.order_id; 

END;

Copy the files you created to S3:
```
$ aws s3 cp ./benchmark.sh s3://DOC-EXAMPLE-BUCKET/benchmark
$ aws s3 cp ./transaction_group_1.sql s3://DOC-EXAMPLE-BUCKET/benchmark/
$ aws s3 cp ./transaction_group_2.sql s3://DOC-EXAMPLE-BUCKET/benchmark/
```
This step is optional and not required to complete the automated benchmark test. However, having the files in Amazon S3 allows you to quickly modify benchmark tests, and distribute SQL changes to your EC2 instances with Systems Manager.

Your EC2 instance configuration is finished. Next, you create an EC2 AMI and launch template.

Create an EC2 AMI

To create an EC2 AMI, complete the following steps:

On the Amazon EC2 console, choose Instances in the navigation pane.
Select your EC2 instance.
On the Actions menu, choose Image and Create Image.
Enter an image name and description.
Choose Create image.

Create an EC2 launch template

To create an EC2 launch template, complete the following steps:

On the Amazon EC2 console, choose Launch Templates in the navigation pane.
Choose Create launch template.
Enter a name and description.
Enter the AMI you created.
Pick an instance type.
Specify a key pair.
Select Virtual Private Cloud (VPC) as your network platform.
Provide one or more security groups.
Leave the storage and resource tags set to their default values.
In Advanced details, choose the IAM instance profile you created as a prerequisite.

If you leave this field blank, it’s not included in the launch template.

Launch your EC2 instances

To launch your EC2 instances, complete the following steps:

On the Amazon EC2 console, choose Launch Templates in the navigation pane.
Select the launch template you created.
Choose Launch instance from template.
For Number of instances, 2 (you can change this depending on your needs).
For Networking platform, select Virtual Private Cloud (VPC).
Choose a subnet.
Choose Add tag to add a resource tag.
For Key, enter benchmark.
For Value¸ enter transaction_group_1.
Repeat this step for another group of instances, but this time for the resource tag enter benchmark for Key and transaction_group_2 for Value.

You now have two groups of EC2 instances. Each group is identified by a resource tag:

Group 1 is tagged with benchmark:transaction_group_1
Group 2 is tagged with benchmark:transaction_group_2

Run a benchmark test on each group

We’re ready to run the benchmark tests. In this step, we run pgbench concurrently on four EC2 instances. On two EC2 instances, write traffic is sent to the Aurora writer node, and from the other two EC2 instances, read traffic is sent to the Aurora reader node.

Group 1 (transaction_group_1)

To test transaction_group_1, complete the following steps:

On the Systems Manager console, choose Node Management in the navigation pane.
Choose Run Command.
On the Run Command page, choose Run command.
In the search bar, enter AWS-RunShellScript.
Select AWS-RunShellScript.

Under Command Parameters, enter the following code (replace the connection environment variables with your specific connection data):

# postgreSQL connection environment variables
export BENCHMARK_HOST=AURORA_WRITER_NODE_HOST
export BENCHMARK_PORT=DATABASE_PORT
export BENCHMARK_DB=DATABASE_NAME
export PGPASSWORD=DATABASE_PASSSWORD
export BENCHMARK_USER=DATABASE_USER
# pgbench control environment variables
export BENCHMARK_CONNECTIONS=16
export BENCHMARK_THREADS=8
export BENCHMARK_SQL_FILE=transaction_group_1.sql
# time in seconds to run test
export BENCHMARK_TIME=300
# run the benchmark test
cd /home/ec2-user/benchmark/
nohup /home/ec2-user/benchmark/benchmark.sh &

We recommend following best practices and the principle of least privilege. See the PostgreSQL documentation for configuring client authentication.

For Targets, select Specify instance tags.
Enter benchmark for the tag key and transaction_group_1 for the value
Choose Add.
Choose Run.

This runs pgbench on the EC2 instances tagged with benchmark:transaction_group_1.

Group 2 (transaction_group_2)

The group transaction_group_2 generates read activity on the reader node. Repeat the steps followed for transaction_group_1, with the following changes:

Change the connection environment variables to point to the reader node
Change BENCHMARK_SQL_FILE=transaction_group_1.sql to BENCHMARK_SQL_FILE=transaction_group_2
For the resource tag, enter benchmark for the key and transaction_group_2 for the value

The walkthrough is complete; you have one group of EC2 instances sending write traffic to the writer node, and another group of instances sending read traffic to the reader node.

Monitoring

A variety of tools are available to monitor your database while conducting benchmark tests, including Amazon RDS Performance Insights, Amazon RDS Enhanced Monitoring, Amazon RDS database logs, and Amazon CloudWatch.

Clean up

To avoid incurring future charges, delete the resources you created from following along with this post.

Conclusion

In this post, we provided a solution to automate and scale database benchmark tests on Aurora PostgreSQL using an EC2 launch template and Systems Manager Run Command. The solution provides the ability to scale your benchmark test across multiple EC2 instances, and have groups of different SQL statements running concurrently.

As you modify your benchmark tests, I encourage you to experiment with Systems Manager Run Command to modify the benchmark files in Amazon S3 and copy them to the EC2 instances. While running benchmark tests against your database, it’s important to optimize cost, follow best practices, and create relevant benchmark tests matching your workload data access patterns.

About the Author

Andrew Love is a Sr. Solutions Architect in Federal Financial Worldwide Public Sector at Amazon Web Services. He is passionate about helping customers build well-architected solutions to achieve their business needs. He enjoys spending time with his family, a good game of chess, home improvement projects, and writing code.