In this repository, we show you how to build an internal SaaS service to access foundation models with Amazon Bedrock and Amazon SageMaker in a multi-tenant architecture.
An internal software as a service (SaaS) for foundation models can address governance requirements while providing a simple and consistent interface for the end users. API Gateway is a common design pattern that enable consumption of services with standardization and governance. They can provide loose coupling between model consumers and the model endpoint service that gives flexibility to adapt to changing model versions, architectures and invocation methods.
Multiple tenants within an enterprise could simply reflect to multiple teams or projects accessing LLMs via REST APIs just like other SaaS services. IT teams can add additional governance and controls over this SaaS layer. In this cdk example, we focus specifically on showcasing multiple tenants with different cost centers accessing the service via API gateway. An internal service is responsible to perform usage and cost tracking per tenant and aggregate that cost for reporting. The cdk template provided here deploys all the required resources to the AWS account.
The CDK Stack provides the following deployments:
- Private Networking environment with VPC, Private Subnets, VPC Endpoints for Lambda, API Gateway, and Amazon Bedrock
- API Gateway Rest API
- API Gateway Usage Plan
- API Gateway Key
- Lambda functions to list foundation models on Bedrock
- Lambda functions to invoke models on Bedrock and SageMaker
- Lambda functions to invoke models on Bedrock and SageMaker with streaming response
- DynamoDB table for saving streaming responses asynchronously
- Lambda function to aggregate usage and cost tracking
- EventBridge to trigger the cost aggregation on a regular frequency
- S3 buckets to store the cost tracking logs
- Cloudwatch logs to collect logs from Lambda invocations
- API Gateway Usage Plan
- API Gateway Key
Sample notebook in the notebooks folder can be used to invoke Bedrock as either one of the teams/cost_center. API gateway then routes the request to the lambda that invokes Bedrock models or SageMaker hosted models and logs the usage metrics to cloudwatch. EventBridge triggers the cost tracking lambda on a regular frequnecy to aggregate metrics from the cloudwatch logs and generate aggregate usage and cost metrics for the chosen granularity level. The metrics are stored in S3 and can further be visualized with custom reports.
The CDK Stack creates Rest API compliant with OpenAPI specification standards.
The solution is currently support both REST invocation and Streaming invocation with long polling for Bedrock and SageMaker.
openapi: 3.0.1
info:
title: "<REST_API_NAME>"
version: '2023-12-13T12:12:15Z'
servers:
- url: https://<HOST>.execute-api.<REGION>.amazonaws.com/{basePath}
variables:
basePath:
default: prod
paths:
"/list_foundation_models":
get:
responses:
'401':
description: 401 response
headers:
Access-Control-Allow-Origin:
schema:
type: string
content:
application/json:
schema:
"$ref": "#/components/schemas/Error"
security:
- api_key: []
"/invoke_model":
post:
parameters:
- name: model_id
in: query
required: true
schema:
type: string
description: Id of the base model to invoke
- name: model_arn
in: query
required: true
schema:
type: string
description: ARN of the custom model in Amazon Bedrock
- name: requestId
in: query
required: false
schema:
type: string
description: Request ID for long-polling functionality. Requires streaming=true
- name: team_id
in: header
required: true
schema:
type: string
- name: messages_api
in: header
required: false
schema:
type: string
- name: streaming
in: header
required: false
schema:
type: string
- name: type
in: header
required: false
schema:
type: string
responses:
'401':
description: 401 response
headers:
Access-Control-Allow-Origin:
schema:
type: string
content:
application/json:
schema:
"$ref": "#/components/schemas/Error"
security:
- api_key: []
components:
schemas:
Error:
title: Error Schema
type: object
properties:
message:
type: string
securitySchemes:
api_key:
type: apiKey
name: x-api-key
in: header
team_id | model_id | input_tokens | output_tokens | invocations | input_cost | output_cost |
---|---|---|---|---|---|---|
tenant1 | amazon.titan-tg1-large | 24000 | 2473 | 1000 | 0.0072 | 0.00099 |
tenant1 | anthropic.claude-v2 | 2448 | 4800 | 24 | 0.02698 | 0.15686 |
tenant2 | amazon.titan-tg1-large | 35000 | 52500 | 350 | 0.0105 | 0.021 |
tenant2 | ai21.j2-grande-instruct | 4590 | 9000 | 45 | 0.05738 | 0.1125 |
tenant2 | anthropic.claude-v2 | 1080 | 4400 | 20 | 0.0119 | 0.14379 |
The following examples are providing guidelines on the structure for the configuration file. Please make sure to look at setup/configs.json for the most updated version of the file.
Edit the global configs used in the CDK Stack. For each organizational units that requires a dedicated multi-tenant SaaS environment, create an entry in setup/configs.json
[
{
"STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
"BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
"BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
"LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
"PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
"VPC_CIDR": "10.10.0.0/16" # CIDR used for the private VPC Env,
"API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
"API_BURST_RATE": 5000 # Burst limit assigned to the usage plan
},
{
"STACK_PREFIX": "" # unit 2 with dedicated SaaS resources,
"BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
"BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
"LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
"PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
"VPC_CIDR": "10.20.0.0/16" # CIDR used for the private VPC Env,
"API_THROTTLING_RATE": 10000,
"API_BURST_RATE": 5000
},
]
Execute the following commands:
chmod +x deploy_stack.sh
./deploy_stack.sh
Add FMs through Amazon SageMaker:
We can expose Foundation Models hosted in Amazon SageMaker by providing the endpoint names in a JSON format as described in the example below:
[
{
"STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
"BEDROCK_ENDPOINT": "https://bedrock-runtime.{}.amazonaws.com", # bedrock-runtime endpoint used for invoking Amazon Bedrock
"BEDROCK_REQUIREMENTS": "boto3>=1.34.62 awscli>=1.32.62 botocore>=1.34.62", # Requirements for Amazon Bedrock
"LANGCHAIN_REQUIREMENTS": "aws-lambda-powertools langchain==0.1.12 pydantic PyYaml", # python modules installed for langchain layer
"PANDAS_REQUIREMENTS": "pandas", # python modules installed for pandas layer
"VPC_CIDR": "10.10.0.0/16" # CIDR used for the private VPC Env,
"API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
"API_BURST_RATE": 5000 # Burst limit assigned to the usage plan,
"SAGEMAKER_ENDPOINTS": "{\"Mixtral 8x7B\": \"Mixtral-SM-Endpoint\"}" # List of SageMaker endpoints
}
]
Edit the global configs used in the CDK Stack. For each organizational units that requires a dedicated API Key associated to a crated API Gateway REST API, create an entry in setup/configs.json
[
{
"STACK_PREFIX": "", # unit 1 with dedicated SaaS resources
"API_GATEWAY_ID": "", # Rest API ID
"API_GATEWAY_RESOURCE_ID": "", # Resource ID of the Rest API
"API_THROTTLING_RATE": 10000, #Throttling limit assigned to the usage plan
"API_BURST_RATE": 5000 # Burst limit assigned to the usage plan
}
]
Execute the following commands:
chmod +x deploy_stack.sh
./deploy_stack.sh
For additional reading, refer to: