Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Recently, AWS presented its own CPU on ARM architecture for server solutions.

It was Graviton. As a result, they update some lines of their EC2 instances with new postfix “g” (e.g. m6g.small, r5g.nano, etc.). In their review and presentation, AWS showed impressive results that it is faster in some benchmarks up to 20 percent. On the other hand, some reviewers said that Graviton does not show any significant results and, in some cases, showed fewer performance results than Intel.

We decided to investigate it and do our research regarding Graviton performance, comparing it with other CPUs (Intel and AMD) directly for MySQL.

Disclaimer

The test is designed to be CPU bound only, so we will use a read-only test and make sure there is no I/O activity during the test.
Tests were run on m5.* (Intel) , m5a.* (AMD), m6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix).
Monitoring was done with Percona Monitoring and Management (PMM).
OS: Ubuntu 20.04 TLS.
Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
MySQL– 8.0.26-0 — installed from official packages.
Load tool: sysbench — 1.0.18.
innodb_buffer_pool_size=80% of available RAM.
Test duration is five minutes for each thread and then 90 seconds warm down before the next iteration.
Tests were run three times (to smooth outliers or have more reproducible results), then results were averaged for graphs.
We are going to use high concurrency scenarios for those scenarios when the number of threads would be bigger than the number of vCPU. And low concurrent scenario with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
Scripts to reproduce results on our GitHub.

Test Case

Prerequisite:

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

1	sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

1	sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Test:

Run in a loop for same scenario but different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2.

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

1	sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into 3 parts:

for “small” EC2 with 2, 4, and 8 vCPU
for “medium” EC2 with 16 and 32 vCPU
for “large” EC2 with 48 and 64 vCPU

This “small”, “medium”, and “large” splitting is just synthetic names for better reviewing depends on the amount of vCPu per EC2
There would be four graphs for each test:

Throughput (Queries per second) that EC2 could perform for each scenario (amount of threads)
Latency 95 percentile that EC2 could perform for each scenario (amount of threads)
Relative comparing Graviton and Intel
Absolute comparing Graviton and Intel

Validation that all load goes to CPU, not to DISK I/O, was done also using PMM (Percona Monitoring and Management).

pic 0.1 – OS monitoring during all test stages

From pic.0.1, we can see that there was no DISK I/O activity during tests, only CPU activity. The main activity with disks was during the DB creation stage.

Result for EC2 with 2, 4, and 8 vCPU

plot 1.1. Throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 1.2. Latencies (95 percentile) during the test for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 1.3. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 1.4. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

AMD has the biggest latencies in all scenarios and for all EC2 instances. We won’t repeat this information in all future overviews, and this is the reason why we exclude it in comparing with other CPUs in percentage and numbers values (in plots 1.3 and 1.4, etc).
Instances with two and four vCPU Intel show some advantage for less than 10 percent in all scenarios.
However, an instance with 8 vCPU intel shows an advantage only on scenarios with threads that less or equal amount of vCPU on EC2.
On EC2 with eight vCPU, Graviton started to show an advantage. It shows some good results in scenarios when the number of threads is more than the amount of vCPU on EC2. It grows up to 15 percent in high-concurrency scenarios with 64 and 128 threads, which are 8 and 16 times bigger than the amount of vCPU available for performing.
Graviton start showing an advantage on EC2 with eight vCPU and with scenarios when threads are more than vCPU amount. This feature would appear in all future scenarios – more load than CPU, better result it shows.

Result for EC2 with 16 and 32 vCPU

plot 2.1. Throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 1.2. Latencies (95 percentile) during the test for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 2.3. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 2.4. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

In scenarios with the same load for ec2 with 16 and 32 vCPU, Graviton is continuing to have advantages when the amount of threads is more significant than the amount of available vCPU on instances.
Graviton shows an advantage of up to 10 percent in high concurrency scenarios. However, Intel has up to 20 percent in low concurrency scenarios.
In high-concurrency scenarios, Graviton could show an incredible difference in the number of (read) transactions per second up to 30 000 TPS.

Result for EC2 with 48 and 64 vCPU

plot 3.1. Throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 3.2. Latencies (95 percentile) during the test for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 3.3. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 3.4. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

It looks like Intel shows a significant advantage in most scenarios when its number of threads is less or equal to the amount of vCPU. It seems Intel is really good for such kind of task. When it has some additional free vCPU, it would be better, and this advantage could be up to 35 percent.
However, Graviton shows outstanding results when the amount of threads is larger than the amount of vCPU. It shows an advantage from 5 to 14 percent over Intel.
In real numbers, Graviton advantage could be up to 70 000 transactions per second over Intel performance in high-concurrency scenarios.

Total Result Overview

plot 4.2. Latencies (95 percentile) during the test for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 4.3. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

plot 4.4. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

Conclusions

ARM CPUs show better results on EC2 with more vCPU and with higher load, especially in high-concurrency scenarios.
As a result of small EC2 instances and small load, ARM CPUs show less impressive performance. So we can’t see its benefits comparing with Intel EC2
Intel is still the leader in the area of low-concurrency scenarios. And it is definitely winning on EC2 with a small amount of vCPU.
AMD does not show any competitive results in all cases.

Final Thoughts

AMD — we have a lot of questions about EC2 instances on AMD. So it would be a good idea to check what was going on that EC2 during the test and check the general performance of CPUs on those EC2.
We found out that in some specific conditions, Intel and Graviton could compete with each other. But the other side of the coin is economical. What is cheaper to use in each situation? The next article will be about it.
It would be a good idea to try to use EC2 with Graviton for real high-concurrency DB.
It seems it needs to run some additional scenarios with 256 and 512 threads to check the hypothesis that Graviton could work better when threads are more than vCPU.

Check out part two of our tests!

APPENDIX:

List of EC2 used in research:

CPU type	EC2	EC2 price per hour (USD)	vCPU	RAM
Graviton	m6g.large	0.077	2	8 Gb
Graviton	m6g.xlarge	0.154	4	16 Gb
Graviton	m6g.2xlarge	0.308	8	32 Gb
Graviton	m6g.4xlarge	0.616	16	64 Gb
Graviton	m6g.8xlarge	1.232	32	128 Gb
Graviton	m6g.12xlarge	1.848	48	192 Gb
Graviton	m6g.16xlarge	2.464	64	256 Gb
Intel	m5.large	0.096	2	8 Gb
Intel	m5.xlarge	0.192	4	16 Gb
Intel	m5.2xlarge	0.384	8	32 Gb
Intel	m5.4xlarge	0.768	16	64 Gb
Intel	m5.8xlarge	1.536	32	128 Gb
Intel	m5.12xlarge	2.304	48	192 Gb
Intel	m5.16xlarge	3.072	64	256 Gb
AMD	m5a.large	0.086	2	8 Gb
AMD	m5a.xlarge	0.172	4	16 Gb
AMD	m5a.2xlarge	0.344	8	32 Gb
AMD	m5a.4xlarge	0.688	16	64 Gb
AMD	m5a.8xlarge	1.376	32	128 Gb
AMD	m5a.12xlarge	2.064	48	192 Gb
AMD	m5a.16xlarge	2.752	64	256 Gb

my.cnf

my.cnf:
[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
 join_buffer_size=256K
 sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M

default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000 
bind_address = 0.0.0.0
[client]

my.cnf:

[mysqld]

ssl=0

performance_schema=OFF

skip_log_bin

server_id = 7

# general

table_open_cache = 200000

table_open_cache_instances=64

back_log=3500

max_connections=4000

join_buffer_size=256K

sort_buffer_size=256K

# files

innodb_file_per_table

innodb_log_file_size=2G

innodb_log_files_in_group=2

innodb_open_files=4000

# buffers

innodb_buffer_pool_size=${80%_OF_RAM}

innodb_buffer_pool_instances=8

innodb_page_cleaners=8

innodb_log_buffer_size=64M

default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit = 1

innodb_doublewrite= 1

innodb_flush_method= O_DIRECT

innodb_file_per_table= 1

innodb_io_capacity=2000

innodb_io_capacity_max=4000

innodb_flush_neighbors=0

max_prepared_stmt_count=1000000

bind_address = 0.0.0.0

[client]

Download Percona Monitoring and Management for MySQL Today!

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Stefania Liza

2 years ago

Why compare 3 generation old amd system versus newer intel/amazon system?
What’s the point of this?

George

2 years ago

m5a seem to be 1st gen AMD EPYC 7000 so IPC performance probably behind compared to 2nd and 3rd gen AMD EPYC 7002/7003 which is where AMD starting improving IPC https://aws.amazon.com/ec2/amd/#Instances

Might need to compare with EC C5a AMD EPYC 7002 gen2 cpus too?

anaconda

2 years ago

Why you Nik did this kind of comparison? Is this a joke?
Do you know that you are comparing a car from year 2000 (AMD) to 2021 year car (ARM). You could have compared newer AMD EPYCs but you decided not to?

Comparing a zen1 from 2017 to a ARM CPU from 2020 is not good comparison, while there are zen2 2020 and zen3 2021 available.

We understand you are probably paid by AWS, but this just makes your reputation pretty bad.
I have to consider using Perconas products in the future, if Percona is really a company like this.

Nik Krichko

Author

2 years ago

The primary goal of the research was to looking into Graviton performance comparing identical instances classes only for for MySQL usage. In this case we choose m5/m6* instances classes as they would offer comparable vCPU/memory configurations and it is general purpose instances.

-1

Gary

Reply to Nik Krichko

2 years ago

We use the Amazon ARMs, and love them. What is disappointing is that Percona Server does not support them. Why not? When will you have an ARM distro?

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Disclaimer

Test Case

Prerequisite:

Test:

Results:

Result for EC2 with 2, 4, and 8 vCPU

OVERVIEW:

Result for EC2 with 16 and 32 vCPU

OVERVIEW:

Result for EC2 with 48 and 64 vCPU

OVERVIEW:

Total Result Overview

Conclusions

Final Thoughts

my.cnf

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation