Skip to content

aws-samples/ecs-gpu-scaling

Amazon ECS Auto Scaling for GPU-based Machine Learning Workloads

This repository is intended for engineers looking to horizontally scale GPU-based Machine Learning (ML) workloads on Amazon ECS. This example is for demonstrative purposes only and is not intended for production use.

How it works

Setup

  • Fill the proper values on the .env file.

  • Install AWS CDK.

  • Use AWS CDK to deploy the AWS infrastructure.

cdk deploy --require-approval never
  • Build and push image to Amazon ECR.
./build_image.sh

  • Open 2 terminal session and exec into the ECS task.
TASK_ARN=
aws ecs execute-command \
  --region us-east-1 \
  --cluster ecs-gpu-demo \
  --task ${TASK_ARN} \
  --container gpu \
  --command "/bin/bash" \
  --interactive
  • On one terminal, watch the GPU utilization.
watch -n0.1 nvidia-smi
  • On the other terminal, stress test the GPU.
python3 test.py

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.