AWS Cloud Operations & Migrations Blog

Using Prometheus Adapter to autoscale applications running on Amazon EKS

Automated scaling is an approach to scaling up or down workloads automatically based on resource usage. In Kubernetes, the Horizontal Pod Autoscaler (HPA) can scale pods based on observed CPU utilization and memory usage. In more complex scenarios, we would account for other metrics before deciding the scaling. For example, most web and mobile backends require automated scaling based on requests per second in order to handle traffic bursts. For ETL apps, automated scaling could be triggered by the job queue length exceeding a particular threshold, and so on. Instrumenting your applications with Prometheus and exposing the right metrics for autoscaling lets you fine-tune your apps to handle bursts better and ensure high availability.

Prometheus is an open-source monitoring and alerting toolkit that collects and stores its metrics as time series data. In other words, its metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels. Prometheus Adapter helps query and leverage custom metrics collected by Prometheus, and then utilizes them to make scaling decisions. These metrics are exposed by an API service and can be used readily by Horizontal Pod Autoscaler object.

Managing long-term Prometheus storage infrastructure is challenging.  Therefore, in order to remove the heavy lifting of managing Prometheus, AWS launched Amazon Managed Service for Prometheus , a Prometheus-compatible monitoring service for container infrastructure and application metrics for containers that makes it easy to securely monitor container environments at scale. Amazon Managed Service for Prometheus automatically scales the ingestion, storage, alerting, and querying of operational metrics as workloads scale up and down.

This post will show how to utilize Prometheus Adapter to autoscale Amazon EKS Pods running an Amazon App Mesh workload. AWS App Mesh is a service mesh that makes it easy to monitor and control services. A service mesh is an infrastructure layer dedicated to handling service-to-service communication, usually through an array of lightweight network proxies deployed alongside the application code. We will be registering the custom metric via a Kubernetes API service that HPA will eventually use to make scaling decisions.

Prerequisites

You will need the following to complete the steps in this blog post:

Create an Amazon EKS Cluster

Architecture diagrams shows an AWS Distro for OpenTelemetry (ADOT) scraping metrics from an AppMesh enabled application pod. ADOT sends the scraped the metrics to Amazon Managed Prometheus. Prometheus Adapter deployed in the cluster queries the AMP and register the custom metrics under a custom metrics API server which is then used by the HPA to make the scaling decision

Figure 1: Architecture diagram

We will create a custom metric for the counter exposed by envoy, which is the “envoy_cluster_upstream_rq“. This can be extended to any custom metrics that the application emits.

First, create an Amazon EKS cluster enabled with AWS App Mesh for running the sample application. The eksctl CLI tool will deploy the cluster using the eks-cluster-config.yaml file:

export AMP_EKS_CLUSTER=AMP-EKS-CLUSTER
export AMP_ACCOUNT_ID=<Your Account id>
export AWS_REGION=<Your Region>


cat << EOF > eks-cluster-config.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: $AMP_EKS_CLUSTER
  region: $AWS_REGION
  version: '1.18'
iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: appmesh-controller
      namespace: appmesh-system
      labels: {aws-usage: "application"}
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/AWSAppMeshFullAccess"
managedNodeGroups:
- name: default-ng
  minSize: 1
  maxSize: 3
  desiredCapacity: 2
  labels: {role: mngworker}
  iam:
    withAddonPolicies:
      certManager: true
      cloudWatch: true
      appMesh: true
cloudWatch:
  clusterLogging:
    enableTypes: ["*"]
EOF

Execute the following command to create the EKS cluster:

eksctl create cluster -f eks-cluster-config.yaml

This creates an Amazon EKS cluster named AMP-EKS-CLUSTER and a service account named appmesh-controller that the AWS App Mesh controller will use for EKS.

Next, use the following commands to install the AppMesh controller.

First, get the Custom Resource Definitions (CRDs) in place:

helm repo add eks https://aws.github.io/eks-charts 
helm upgrade -i appmesh-controller eks/appmesh-controller \
   --namespace appmesh-system \
   --set region=${AWS_REGION} \
   --set serviceAccount.create=false \
   --set serviceAccount.name=appmesh-controller

Step 2: Deploy sample application and enable AWS App Mesh

To install an application and inject an envoy container, use the AWS App Mesh controller for Kubernetes that you created earlier. AWS App Mesh Controller for K8s manages App Mesh resources in your Kubernetes clusters. The controller is accompanied by CRDs that allow you to define AWS App Mesh components, such as meshes and virtual nodes, via the Kubernetes API just as you define native Kubernetes objects, such as deployments and services. These custom resources map to AWS App Mesh API objects that the controller manages for you. The controller watches these custom resources for changes and reflects them into the AWS App Mesh API.

## Install the base application
git clone https://github.com/aws/aws-app-mesh-examples.git 
kubectl apply -f aws-app-mesh-examples/examples/apps/djapp/1_base_application

kubectl get all -n prod ## check the pod status and make sure it is running
## Now install the App Mesh controller and meshify the deployment
kubectl apply -f aws-app-mesh-examples/examples/apps/djapp/2_meshed_application/
kubectl rollout restart deployment -n prod dj jazz-v1 metal-v1
kubectl get all -n prod ## Now we should see two containers running in each pod

Step 3: Create an Amazon Managed Service for Prometheus workspace

The Amazon Managed Service for Prometheus workspace ingests the Prometheus metrics collected from envoy. A workspace is a logical and isolated Prometheus server dedicated to Prometheus resources such as metrics. A workspace supports fine-grained access control for authorizing its management, such as update, list, describe, and delete, as well as ingesting and querying metrics.

aws amp create-workspace --alias AMP-APPMESH --region $AWS_REGION

Next, optionally create an interface VPC endpoint in order to securely access the managed service from resources deployed in your VPC. An Amazon Managed Service for Prometheus public endpoint is also available. This ensures that data ingested by the managed service won’t leave your AWS account VPC. Utilize the AWS CLI as shown here. Replace the placeholder strings, such as VPC_ID, AWS_REGION, with your values.

export VPC_ID=<Your EKS Cluster VPC Id>
aws ec2 create-vpc-endpoint \ 
--vpc-id $VPC_ID \ 
--service-name com.amazonaws.<$AWS_REGION>.aps-workspaces \ 
--security-group-ids <SECURITY_GROUP_IDS> \ 
--vpc-endpoint-type Interface \ 
--subnet-ids <SUBNET_IDS>

Step 4: Scrape the metrics using AWS Distro for OpenTelemetry

Amazon Managed Service for Prometheus does not directly scrape operational metrics from containerized workloads in a Kubernetes cluster. You must deploy and manage a Prometheus server or an OpenTelemetry agent such as the AWS Distro for OpenTelemetry Collector or the Grafana Agent in order to perform this task. This post will walk you through the configuring of the AWS Distro for Open Telemetry (ADOT) in order to scrape the envoy metrics. The ADOT-AMP pipeline lets us use the ADOT Collector to scrape a Prometheus-instrumented application, and then send the scraped metrics to Amazon Managed Service for Prometheus.

This post will also walk you through the steps to configure an IAM role to send Prometheus metrics to Amazon Managed Service for Prometheus. We install the ADOT collector on the Amazon EKS cluster and forward metrics to Amazon Managed Service for Prometheus.

Configure permissions

We will be deploying the ADOT collector to run under the identity of a Kubernetes service account “amp-iamproxy-service-account”. With IAM roles for service accounts (IRSA), you can associate the AmazonPrometheusRemoteWriteAccess role with a Kubernetes service account, thereby providing IAM permissions to any pod utilizing the service account to ingest the metrics to Amazon Managed Service for Prometheus.

You need kubectl and eksctl CLI tools in order to run the script. They must be configured to access your Amazon EKS cluster.

kubectl create namespace prometheus
eksctl create iamserviceaccount --name amp-iamproxy-service-account --namespace prometheus --cluster $AMP_EKS_CLUSTER --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess --approve
export WORKSPACE=$(aws amp list-workspaces | jq -r '.workspaces[] | select(.alias=="AMP-APPMESH").workspaceId')
export REGION=$AWS_REGION
export REMOTE_WRITE_URL="https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WORKSPACE/api/v1/remote_write"

Now create a manifest file, amp-eks-adot-prometheus-daemonset.yaml, with the scrape configuration in order to extract envoy metrics and deploy the ADOT collector. This example deploys a DaemonSet named adot-collector. The adot-collector DaemonSet collects metrics from pods on the cluster.

cat > amp-eks-adot-prometheus-daemonset.yaml <<EOF
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adot-collector-conf
  namespace: prometheus
  labels:
    app: aws-adot
    component: adot-collector-conf
data:
  adot-collector-config: |
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 15s
            scrape_timeout: 10s

          scrape_configs:
          - job_name: 'appmesh-envoy'
            metrics_path: /stats/prometheus
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - source_labels: [__meta_kubernetes_pod_container_name]
              action: keep
              regex: '^envoy$'
            - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
              action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: ${1}:9901
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - source_labels: [__meta_kubernetes_namespace]
              action: replace
              target_label: namespace
            - source_labels: ['app']
              action: replace
              target_label: service 
            - source_labels: [__meta_kubernetes_pod_name]
              action: replace
              target_label: kubernetes_pod_name  
          - job_name: 'kubernetes-service-endpoints'

            kubernetes_sd_configs:
            - role: endpoints

            tls_config:
              ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              insecure_skip_verify: true
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

            relabel_configs:
            - source_labels: [__meta_kubernetes_service_annotation_scrape]
              action: keep
              regex: true

    exporters:
      awsprometheusremotewrite:
        # replace this with your endpoint
        endpoint: "$REMOTE_WRITE_URL"
        # replace this with your region
        aws_auth:
          region: "$REGION"
          service: "aps"
      logging:
        loglevel: info

    extensions:
      health_check:
      pprof:
        endpoint: :1888
      zpages:
        endpoint: :55679

    service:
      extensions: [pprof, zpages, health_check]
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [logging, awsprometheusremotewrite]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: adotcol-admin-role
rules:
  - apiGroups: [""]
    resources:
    - nodes
    - nodes/proxy
    - services
    - endpoints
    - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
    - extensions
    resources:
    - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: adotcol-admin-role-binding
subjects:
  - kind: ServiceAccount
    name: amp-iamproxy-service-account
    namespace: prometheus
roleRef:
  kind: ClusterRole
  name: adotcol-admin-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: Service
metadata:
  name: adot-collector
  namespace: prometheus
  labels:
    app: aws-adot
    component: adot-collector
spec:
  ports:
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  selector:
    component: adot-collector
  type: NodePort
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: adot-collector
  namespace: prometheus
  labels:
    app: aws-adot
    component: adot-collector
spec:
  selector:
    matchLabels:
      app: aws-adot
      component: adot-collector
  minReadySeconds: 5
  template:
    metadata:
      labels:
        app: aws-adot
        component: adot-collector
    spec:
      serviceAccountName: amp-iamproxy-service-account
      containers:
      - command:
          - "/awscollector"
          - "--config=/conf/adot-collector-config.yaml"
        image: public.ecr.aws/aws-observability/aws-otel-collector:latest
        name: adot-collector
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 400Mi
        ports:
        - containerPort: 8888  # Default endpoint for querying metrics.
        volumeMounts:
        - name: adot-collector-config-vol
          mountPath: /conf
        livenessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
        readinessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: adot-collector-conf
            items:
              - key: adot-collector-config
                path: adot-collector-config.yaml
          name: adot-collector-config-vol
---
EOF

kubectl apply -f amp-eks-adot-prometheus-daemonset.yaml

After the ADOT collector is deployed, it will collect the metrics and ingest them into the specified Amazon Managed Service for Prometheus workspace. The scrape configuration is similar to that of a Prometheus server. We will add the necessary configuration for scraping envoy metrics.

Step 5: Deploy the Prometheus Adapter to register custom metric

We will be creating a serviceaccount “monitoring” that will be used to run the Prometheus adapter. We will also be assigning the AmazonPrometheusQueryAccess permission using IRSA.

kubectl create namespace monitoring
eksctl create iamserviceaccount --name monitoring --namespace monitoring --cluster AMP-EKS-CLUSTER --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess --approve --override-existing-serviceaccounts

cat > pma-cm.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: 'envoy_cluster_upstream_rq {namespace!="",kubernetes_pod_name!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          kubernetes_pod_name: {resource: "pod"}
      name:
        matches: "envoy_cluster_upstream_rq " 
        as: "appmesh_requests_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)' 
EOF

kubectl apply -f  pma-cm.yaml

openssl req -new -newkey rsa:4096 -x509 -sha256 -days 365 -nodes -out serving.crt -keyout serving.key -subj "/C=CN/CN=custom-metrics-apiserver.monitoring.svc.cluster.local"
kubectl create secret generic -n monitoring cm-adapter-serving-certs --from-file=serving.key=./serving.key --from-file=serving.crt=./serving.crt   

The Envoy sidecar utilized by AWS App Mesh exposes a counter envoy_cluster_upstream_rq_total. You can configure the Prometheus adapter to transform this metric into req/sec rate. Below is the Prometheus adapter configuration information. The adapter will be connecting to the Amazon Managed Service for Prometheus’s query endpoint through sigv4 proxy.

We will now deploy the Prometheus adapter to create the custom metric:

cat > prometheus-adapter.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-resource-reader
subjects:
- kind: ServiceAccount
  name: monitoring
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: monitoring
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: monitoring
  namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: custom-metrics-apiserver
  name: custom-metrics-apiserver
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: custom-metrics-apiserver
  template:
    metadata:
      labels:
        app: custom-metrics-apiserver
      name: custom-metrics-apiserver
    spec:
      serviceAccountName: monitoring
      containers:
      - name: custom-metrics-apiserver
        image: directxman12/k8s-prometheus-adapter-amd64
        args:
        - /adapter
        - --secure-port=6443
        - --tls-cert-file=/var/run/serving-cert/serving.crt
        - --tls-private-key-file=/var/run/serving-cert/serving.key
        - --logtostderr=true
        - --prometheus-url=http://localhost:8080/workspaces/$WORKSPACE
        - --metrics-relist-interval=30s
        - --v=10
        - --config=/etc/adapter/config.yaml
        ports:
        - containerPort: 6443 
        volumeMounts:
        - mountPath: /var/run/serving-cert
          name: volume-serving-cert
          readOnly: true
        - mountPath: /etc/adapter/
          name: config
          readOnly: true
      - name: aws-iamproxy
        image: public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0
        args:
          - --name
          - aps
          - --region
          - us-east-1
          - --host
          - aps-workspaces.us-east-1.amazonaws.com
        ports:
          - containerPort: 8080      
      volumes:
      - name: volume-serving-cert
        secret:
          secretName: cm-adapter-serving-certs
      - name: config
        configMap:
          name: adapter-config
---
apiVersion: v1
kind: Service
metadata:
  name: custom-metrics-apiserver
  namespace: monitoring
spec:
  ports:
  - port: 443
    targetPort: 6443
  selector:
    app: custom-metrics-apiserver
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: custom-metrics-apiserver
    namespace: monitoring
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100          
EOF
kubectl apply -f prometheus-adapter.yaml

We will create an API service so that our Prometheus adapter is accessible by Kubernetes API. Therefore, metrics can be fetched by our Horizontal Pod Autoscaler. We can query the custom metric API to see if the metric has been created.

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 |jq .
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/appmesh_requests_per_second",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/appmesh_requests_per_second",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

Now you can use the appmesh_requests_per_second metric in the HPA definition with the following HPA resource:

cat > hpa.yaml <<EOF
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: envoy-hpa
  namespace: prod
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: jazz-v1
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metricName: appmesh_requests_per_second
        targetAverageValue: 10m
EOF
kubectl apply -f hpa.yaml

Now, we will be able to scale the pods when the threshold for the metric “appmesh_request_per_second” exceeds 10.

Let us add some load to experience the autoscaling actions:

dj_pod=`kubectl get pod -n prod --no-headers -l app=dj -o jsonpath='{.items[*].metadata.name}'`
loop_counter=0
while [ $loop_counter -le 300 ] ; do kubectl exec -n prod -it $dj_pod -c dj -- curl jazz.prod.svc.cluster.local:9080 ; echo ; loop_counter=$[$loop_counter+1] ; done

Describing the HPA will show the scaling actions resulting from the load we introduced.

kubectl describe hpa -n prod

Name:                                     envoy-hpa
Namespace:                                prod
Labels:                                   <none>
Annotations:                              <none>
CreationTimestamp:                        Mon, 06 Sep 2021 04:19:37 +0000
Reference:                                Deployment/jazz-v1
Metrics:                                  ( current / target )
  "appmesh_requests_per_second" on pods:  622m / 10m
Min replicas:                             1
Max replicas:                             10
Deployment pods:                          8 current / 10 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 10
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric appmesh_requests_per_second
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
  
  Events:
  Type     Reason               Age                 From                       Message
  ----     ------               ----                ----                       -------
  Warning  FailedGetPodsMetric  58m (x44 over 69m)  horizontal-pod-autoscaler  unable to get metric appmesh_requests_per_second: unable to fetch metrics from custom metrics API: the server could not find the metric appmesh_requests_per_second for pods
  Normal   SuccessfulRescale    41s                 horizontal-pod-autoscaler  New size: 4; reason: pods metric appmesh_requests_per_second above target
  Normal   SuccessfulRescale    26s                 horizontal-pod-autoscaler  New size: 8; reason: pods metric appmesh_requests_per_second above target
  Normal   SuccessfulRescale    11s                 horizontal-pod-autoscaler  New size: 10; reason: pods metric appmesh_requests_per_second above target

Clean-up

Use the following commands to delete resources created during this post:

aws amp delete-workspace --workspace-id $WORKSPACE
eksctl delete cluster $AMP_EKS_CLUSTER

Conclusion

This blog demonstrated how we can utilize Prometheus Adapter to autoscale deployments based on some custom metrics. For the sake of simplicity, we have only fetched one metric from AMP. However, the Adapter configmap can be extended to fetch some or all of the available metrics and utilize them for autoscaling.

Further Reading

About the author

Vikram Venkataraman

Vikram Venkataraman is a Senior Technical Account Manager at Amazon Web Services and also a container enthusiast. He helps organization with best practices for running workloads on AWS. In his spare time, he loves to play with his two kids and follows Cricket.