Repository with the code for running deep learning inference benchmarks on different AWS instances and service types.
This example demonstrates how to deploy a deep learning model for image inference using ONNX on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.
- Python 3.6 or later
- Docker
- AWS CLI
- AWS Copilot
Clone repository
git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/cpu/aws-copilot-inference-service
Initialize the environment and deploy the application.
copilot env init
copilot deploy
Make single prediction
curl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://<prefix>.us-east-1.elb.amazonaws.com/predict
Benchmark using apache benchmark
ab -n 10 -c 10 -p flower.png -T image/jpeg http://<prefix>.us-east-1.elb.amazonaws.com/predict
docker build -t image-inference .
docker run --rm -p 8080:8080 image-inference
curl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://localhost:8080/predict
pip install -r dev-requirements.txt
pytest -v test_inference.py
This example demonstrates how to deploy large language model for text generation using transformers library on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.
Clone repository
git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/transformers/aws-copilot-inference-service
Clone model from Hugging Face repo. Example - LaMini T5 223M
git lfs install
git clone https://huggingface.co/MBZUAI/LaMini-T5-223M.git
mv LaMini-T5-223M model
Initialize the environment and deploy the application.
copilot env init
copilot deploy
Make single prediction
curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://<prefix>.us-east-1.elb.amazonaws.com/predict
docker build -t llm-inference .
docker run --rm -p 8080:8080 llm-inference
curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://localhost:8080/predict