AWS Open Source Blog

Using AWS Distro for OpenTelemetry Collector for cross-account metrics collection on Amazon ECS

In November 2020, we announced OpenTelemetry support on AWS with AWS Distro for OpenTelemetry (ADOT), a secure, production-ready, AWS-supported distribution of the Cloud Native Computing Foundation (CNCF) OpenTelemetry project. With ADOT, you can instrument applications to send correlated metrics and traces to multiple AWS solutions, such as our Amazon Managed Service for Prometheus (AMP) and Partner monitoring solutions.

Many customers have their applications running on separate AWS accounts—and even separate AWS Regions—and would like to have a central place for observability. In a previous article, we explained how to collect metrics across multiple accounts with Amazon Elastic Kubernetes Service (Amazon EKS). The scenario will be similar, except, in this one, we use the ADOT agent to collect application and platform metrics for workloads running on Amazon Elastic Container Service (Amazon ECS), our native container orchestration platform to an AMP workspace.

Setup overview

To resolve this challenge, we will use the following structure.

On the workload accounts:

  • Create an IAM role to be used by Amazon ECS tasks.

On the central monitoring account:

  • Create an AMP workspace.
  • Create an IAM role that allows cross-account access to AMP.

On the workload accounts:

  • Create Amazon ECS tasks permissions to assume a cross-account IAM role.
  • Set up the application and the AWS Distro for OpenTelemetry agent.
  • Create an Amazon ECS cluster and run the application.

On the central monitoring account:

The entire architecture looks like the following:

entire architecture illustrated

Workload account: ECS role setup

Logged into the workload account, we create an IAM role that will be used later by Amazon ECS tasks. This role then will be trusted on the central monitoring account and granted assume-role permissions.

cat > task-assume-role.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
aws iam create-role --role-name ecs-xaccount-task-role \
  --assume-role-policy-document file://task-assume-role.json \
  --region eu-west-1

Monitoring account setup

Logged into the workload account, we create an AMP workspace with the following command with awscli:

aws amp create-workspace --alias ecs-xaccount-metrics-demo --region eu-west-1

Alternatively, we can use the AWS console and navigate to the AMP service.

View of AWS console navigated to the AMP service

We now can create an IAM role with write permissions to the AMP workspace. To grant multiple accounts, populate the "AWS" array with appropriate IAM role ARNs:

WORKLOAD_ACCOUNT_ID=

cat > policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::$WORKLOAD_ACCOUNT_ID:role/ecs-xaccount-task-role"
        ]
      },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}
EOF
# Note: You might encounter an error if the ecs-xaccount-task-role
# does not exists in the workload account.
aws iam create-role \
  --role-name ECS-AMP-Central-Role \
  --assume-role-policy-document file://policy.json \
  --query 'Role.RoleName' \
  --output text
aws iam attach-role-policy --role-name ECS-AMP-Central-Role \
    --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess

Workload account

Note: You can repeat instructions in this section for as many workload accounts as needed.

Logged into the workload account, we grant assumeRole permissions to the role created previously:

# Set the central account id
CENTRAL_ACCOUNT_ID=

cat > policy.json <<EOF
{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "sts:AssumeRole"
         ],
         "Resource":"arn:aws:iam::${CENTRAL_ACCOUNT_ID}:role/ECS-AMP-Central-Role"
      }
   ]
}
EOF
POLICY_ARN=$(aws iam create-policy --policy-name xaccount-amp-write \
    --policy-document file://policy.json | jq -r '.Policy.Arn')
aws iam attach-role-policy --role-name ecs-xaccount-task-role \
    --policy-arn $POLICY_ARN 

Workload configuration

Next, we set up a sample application that exposes Prometheus metrics:

  • Configure the aws-otel-collector to scrape the application and ECS metrics.
  • Build Docker images and host them on Amazon Elastic Container Registry (Amazon ECR).
  • Configure, create an Amazon ECS cluster, and run everything using ecs-cli.

The layout should be organized as follows:

├── aws-otel-collector
│   ├── Dockerfile
│   └── config.yaml
├── demo-app
│   ├── Dockerfile
│   └── main.go
├── docker-compose.yml
└── ecs-params.yml

To set up Amazon ECS, we need Docker and ecs-cli as requirements. On Linux, ecs-cli can be installed like this:

sudo curl -Lo /usr/local/bin/ecs-cli https://amazon-ecs-cli.s3.amazonaws.com/ecs-cli-linux-amd64-latest

Now, let’s create the sample application that exposes a /metrics Prometheus endpoint:

mkdir demo-app
cd demo-app/

cat > main.go <<EOF
package main
import (
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "net/http"
)
func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8000", nil)
}
EOF

This will create a Dockerfile for the application:

cat > Dockerfile <<EOF
FROM golang:1.18 as builder
WORKDIR /go/src/app
COPY . .
RUN go mod init demo
RUN go get .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

FROM alpine:latest
WORKDIR /app
RUN apk --no-cache add ca-certificates
COPY --from=builder /go/src/app/app .
EXPOSE 8000
CMD ["./app"]
EOF

And finally, the following script will create an ECR repository, build the application image, and push the image to Amazon ECR:

APP_REPOSITORY=$(aws ecr create-repository --repository demo-app --query repository.repositoryUri --output text)
docker build . -t demo-app
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $APP_REPOSITORY
docker tag demo-app:latest $APP_REPOSITORY
docker push $APP_REPOSITORY
cd -

Now, let’s configure the AWS Distro for OpenTelemetry Collector. We will create a custom configuration to collect data called a Pipeline. A Pipeline defines a path the data follows in the collector starting from reception, then further processing or modification, and finally exiting the collector via exporters.

We will collect from the application with the /metrics endpoint and make use of the ecs-metrics-receiver to scrape various ECS task metadata from the ECS task metadata endpoint. Visit the documentation to learn more about ecs-metrics-receiver and other configuration options.

We will export collected metrics to the AMP workspace created on the monitoring account using awsprometheusremotewrite exporters configuration. We will provide both the AMP remote_write endpoint and the IAM role to assume—in our case, ECS-AMP-Central-Role.

Edit the WORKSPACE_ID and CENTRAL_ACCOUNT_ID variables and run the following script to create the pipeline:

WORKSPACE_ID=
CENTRAL_ACCOUNT_ID=

mkdir aws-otel-collector
cd aws-otel-collector

cat > config.yaml <<EOF
receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "prometheus-demo-app"
        static_configs:
        - targets: [ 0.0.0.0:8000 ]
  awsecscontainermetrics:
    collection_interval: 20s

processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes

exporters:
  prometheusremotewrite:
    endpoint: https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write
    auth:
      authenticator: sigv4auth
  logging:
    loglevel: debug

extensions:
  sigv4auth:
    service: "aps"
    assume_role:
      arn: arn:aws:iam::$CENTRAL_ACCOUNT_ID:role/ECS-AMP-Central-Role
      sts_region: us-west-2

service:
  extensions: [sigv4auth]
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, prometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, prometheusremotewrite]
EOF

From the latest version of the aws-otel-collector, create a custom image on Amazon ECR with our custom configuration:

cat > Dockerfile <<EOF
FROM public.ecr.aws/aws-observability/aws-otel-collector:latest
COPY config.yaml /etc/ecs/otel-config.yaml
CMD ["--config=/etc/ecs/otel-config.yaml"]
EOF

Finally, build and push the image:

COLLECTOR_REPOSITORY=$(aws ecr create-repository --repository aws-otel-collector --query repository.repositoryUri --output text)
docker build . -t aws-otel-collector
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $COLLECTOR_REPOSITORY
docker tag aws-otel-collector:latest $COLLECTOR_REPOSITORY
docker push $COLLECTOR_REPOSITORY
cd -

Run application: Set up Amazon ECS

Amazon ECS needs an execution role—a set of permissions to run our tasks. Run the following script to create it:

cat > task-execution-assume-role.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
aws iam create-role --role-name ecs-xaccount-task-execution-role \
    --assume-role-policy-document file://task-execution-assume-role.json \
    --region eu-west-1
aws iam --region eu-west-1 attach-role-policy --role-name ecs-xaccount-task-execution-role \
    --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Set up the WORKLOAD_ACCOUNT_ID variable and run the following script to create a docker-compose file:

WORKLOAD_ACCOUNT_ID=

cat > docker-compose.yml <<EOF
version: "3"
services:
  aws-otel-collector:
    image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/aws-otel-collector:latest
    environment:
      - AWS_REGION=eu-west-1
    logging:
      driver: awslogs
      options: 
        awslogs-group: ecs-xaccount-metrics-demo
        awslogs-region: eu-west-1
        awslogs-stream-prefix: aws-otel-collector

  prometheus-demo-app:
    image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/demo-app
    ports:
      - "8000:8000"
    depends_on:
      - aws-otel-collector
    logging:
      driver: awslogs
      options: 
        awslogs-group: ecs-xaccount-metrics-demo
        awslogs-region: eu-west-1
        awslogs-stream-prefix: demo-app
EOF

Using ecs-cli, we will create an Amazon ECS cluster:

ecs-cli configure --cluster ecs-xaccount-metrics-demo \
    --default-launch-type FARGATE \
    --config-name ecs-xaccount-metrics-demo \
    --region eu-west-1
ecs-cli up --cluster-config ecs-xaccount-metrics-demo

After few minutes, the cluster should be created with all necessary associated resources. Select the VPC_ID from the preceding command and get the default security group associated to the VPC:

VPC_ID=

aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC_ID \
  --region eu-west-1 \
  --query SecurityGroups[0].GroupId \
  --output text

Edit the ecs-params.yml file needed by ecs-cli, and replace the subnet IDs and security group from the previous outputs:

version: 1
task_definition:
  ecs_network_mode: awsvpc
  task_role_arn: ecs-xaccount-task-role
  task_execution_role:  ecs-xaccount-task-execution-role
  task_size:
    mem_limit: 0.5GB
    cpu_limit: 256
run_params:
  network_configuration:
    awsvpc_configuration:
      subnets:
        - "subnet-"
        - "subnet-"
      security_groups:
        - "sg-"
      assign_public_ip: ENABLED

Finally, run the following script to deploy the application:

ecs-cli compose --project-name ecs-xaccount-metrics-demo \
  service up \
  --cluster-config ecs-xaccount-metrics-demo \
  --create-log-groups

After few minutes, the Amazon ECS service should be up and running. You can verify the logs of the aws-otel-collector on the Amazon CloudWatch Logs console, with the log group ecs-xaccount-metrics-demo.

Monitoring account: Visualize metrics

Back in the monitoring account, let’s visualize our metrics using an Amazon Managed Grafana workspace. Refer to the documentation to set up Amazon Managed Grafana.

We can view metrics coming from the application endpoint:

metrics coming from the application endpoint

And the Amazon ECS cluster metrics:

Amazon ECS cluster metrics

Clean up

Workload account

WORKLOAD_ACCOUNT_ID=

# stop and deletes ecs service
ecs-cli compose --project-name ecs-xaccount-metrics-demo service down --cluster-config ecs-xaccount-metrics-demo

# delete ecs cluster
ecs-cli down --cluster-config ecs-xaccount-metrics-demo

# delete task role
aws iam detach-role-policy --role-name ecs-xaccount-task-role --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write
aws iam delete-policy --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write
aws iam delete-role --role-name ecs-xaccount-task-role

# delete task execution role
aws iam detach-role-policy --role-name ecs-xaccount-task-execution-role --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam delete-role --role-name ecs-xaccount-task-execution-role

Central account

WORKSPACE_ID=

# delete role
aws iam detach-role-policy --role-name ECS-AMP-Central-Role --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
aws iam delete-role --role-name ECS-AMP-Central-Role
# delete workspace
aws amp delete-workspace --workspace-id $WORKSPACE_ID

Conclusion

In this post, we explained how to use the AWS Distro for OpenTelemetry (ADOT) agent to collect application and platform metrics for workloads running on Amazon ECS.

You can use ADOT on other platforms, such as Amazon EKS, Amazon Elastic Compute Cloud (Amazon EC2), or on-premises. Additionally, you can use ADOT to collect distributed traces data and have multiple heterogeneous workload accounts sending metrics centrally to AMP and other platforms. Also, you can set up private connectivity with VPC endpoints and VPC peering, according to your needs.

Visit the ADOT, AMP, and Amazon Managed Grafana sites to learn more.

Rodrigue Koffi

Rodrigue Koffi

Rodrigue is a Specialist Solutions Architect at Amazon Web Services for Observability. He is passionate about observability, distributed systems, and machine learning. He has a strong DevOps and software development background and loves programming with Go. Find him on LinkedIn at /grkoffi

Rafael Pereyra

Rafael Pereyra

Rafael Pereyra is a Principal. Security Architect at AWS Professional Services, where he helps customers securely deploy, monitor and operate solutions in the cloud. Rafael's interests includes containerized applications, improving observability, monitoring and logging of solutions, IaC and automation in general. In Rafael’s spare time, he enjoys cooking with family and friends.