AWS Big Data Blog

Getting started with Trace Analytics in Amazon Elasticsearch Service

 Updated May 11, 2021. See the release notes below for more details.

Trace Analytics is now available for Amazon Elasticsearch Service (Amazon ES) domains running versions 7.9 or later. Developers and IT Ops teams can use this feature to troubleshoot performance and availability issues in their distributed applications. It provides end-to-end insights that aren’t possible with traditional methods of collecting logs and metrics from each component and service individually.

This feature provides a mechanism to ingest OpenTelemetry-standard trace data to be visualized and explored in Kibana. Trace Analytics introduces two new components that fit into the OpenTelemetry and Amazon ES ecosystems:

  • Data Prepper – A server-side application that collects telemetry data and transforms it for Amazon ES.
  • Trace Analytics Kibana plugin – A plugin that provides at-a-glance visibility into your application performance and the ability to drill down on individual traces. The plugin relies on trace data collected and transformed by Data Prepper.

The following diagram illustrates a component overview.

Here is a component overview:

Applications are instrumented with OpenTelemetry instrumentation, which emits trace data to OpenTelemetry Collectors. Collectors can be run as agents on Amazon Elastic Compute Cloud (Amazon EC2) as sidecars for Amazon Elastic Container Service (Amazon ECS), or as sidecars or DaemonSets for Amazon Elastic Kubernetes Service (Amazon EKS). They’re configured to export traces to Data Prepper, which transforms the data and writes it to Amazon ES. You can then use the Trace Analytics Kibana plugin to visualize and detect problems in your distributed applications.

OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project that aims to define an open standard for the collection of telemetry data. Using an OpenTelemetry Collector in your service environment allows you to ingest trace data from other projects like Jaeger, Zipkin, and more.

In this post, we cover the following topics:

  • Launching Data Prepper to send trace data to your Amazon ES domain
  • Configuring an OpenTelemetry Collector to send trace data to Data Prepper
  • Exploring the Kibana Trace Analytics plugin using a sample application

Prerequisites

To get started, you need the following:

Deploy to Amazon EC2 with AWS CloudFormation

Use the CloudFormation template to deploy Data Prepper to Amazon EC2.

  1. On the AWS CloudFormation console, choose Create stack.
  2. In Specify template, choose Upload a template file, and then upload the CloudFormation template.
  3. All fields on the Specify stack details page are required. Although you can use the defaults for most fields, enter your values for the following:
    1. AmazonEsEndpoint
    2. AmazonEsRegion
    3. AmazonEsSubnetId (if your Amazon ES domain is in a VPC)
    4. IAMRole
    5. KeyName

The InstanceType parameter allows you to specify the size of the EC2 instance that is created. For recommendations on instance sizing by workload, see Right Sizing: Provisioning Instances to Match Workloads, and the Scaling and Tuning guide of the Data Prepper repository.

It should take about 3 minutes to provision the stack. Data Prepper starts during the CloudFormation stack deployment. To view output logs, use SSH to connect to the EC2 host and then inspect the /var/log/data-prepper.out file.

Configure an OpenTelemetry Collector

Now that Data Prepper is running on an EC2 instance, you can send trace data to it by running an OpenTelemetry Collector in your service environment. For information about installation, see Getting Started in the OpenTelemetry documentation. Make sure that the Collector is configured with an exporter that points to the address of the Data Prepper host. The following otel-collector-config.yaml example receives data from various sources and exports it to Data Prepper:

receivers:
  jaeger:
    protocols:
      grpc:
  otlp:
    protocols:
      grpc:
  zipkin:

exporters:
  otlp/data-prepper:
    endpoint: <data-prepper-address>:21890
    insecure: true

service:
  pipelines:
    traces:
      receivers: [jaeger, otlp, zipkin]
      exporters: [otlp/data-prepper]

Be sure to allow traffic to port 21890 on the EC2 instance. You can do this by adding an inbound rule to the instance’s security group.

Explore the Trace Analytics Kibana plugin by using a sample application

If you don’t have an OpenTelemetry Collector running and want to send sample data to your Data Prepper instance to try out the trace analytics dashboard, you can quickly set up an instance of the Jaeger Hot R.O.D. application on the EC2 instance with Docker Compose. Our setup script creates three containers on the EC2 instance:

  • Jaeger Hot R.O.D. – The example application to generate trace data
  • Jaeger Agent – A network daemon that batches trace spans and sends them to the Collector
  • OpenTelemetry Collector – A vendor-agnostic executable capable of receiving, processing, and exporting telemetry data

Although your application, the OpenTelemetry Collectors, and Data Prepper instances typically wouldn’t reside on the same host in a real production environment, for simplicity and cost, we use one EC2 instance.

To start the sample application, complete the following steps:

  1. Use SSH to connect to the EC2 instance using the private key specified in the CloudFormation stack.
    1. When connecting, add a tunnel to port 8080 (the Hot R.O.D. container accepts connections from localhost only). You can do this by adding -L 8080:localhost:8080 to your SSH command.
  2. Download the setup script by running the following code:
    wget https://raw.githubusercontent.com/opendistro-for-elasticsearch/data-prepper/master/examples/aws/jaeger-hotrod-on-ec2/setup-jaeger-hotrod.sh
  3. Run the script with sh setup-jaeger-hotrod.sh.
  4. Visit http://localhost:8080/ to access the Hot R.O.D. dashboard and start sending trace data.
    Figure 2: Hot R.O.D. Rides on Demand
  5. After you generate sample data with the Hot R.O.D. application, navigate to your Kibana dashboard and in the navigation pane, choose Trace Analytics.

The Dashboard view groups traces by HTTP method and path so that you can see the average latency, error rate, and trends associated with an operation.

Figure 3: Dashboard page

  1. For a more focused view, choose Traces to drill down into a specific trace.
    Figure 4: Traces page
  2. Choose Services to view all services in the application and an interactive map that shows how the various services connect to each other.
    Figure 5: Services page

Conclusion

Trace Analytics adds to the existing log analytics capabilities of Amazon ES, enabling developers to isolate sources of performance problems and diagnose root causes in their distributed applications. We encourage you to start sending your trace data to Amazon ES so you can benefit from Trace Analytics today.


March 25, 2021 update

Data Prepper version 0.8.0-beta has been released and provides new features not covered in this post. Some highlights include:

  • You can now deploy horizontally scaling clusters by using the new Peer Forwarder plugin. Refer to the GitHub repository for new container-based deployment strategies.
  • You can now scrape Prometheus-friendly metrics via a new /metrics

May 11, 2021 update

Data Prepper version v1.0.0 has been released and now builds off of v1.0.0 of the OpenTelemetry tracing specification. This Data Prepper release also fixes a compatibility issue, and is required for Amazon ES domains running service software version R20210426 or greater.

See the project’s Releases page for more information about the 1.0.0 release and future releases.


About the authors

Jeff Wright is a Software Development Engineer at Amazon Web Services where he works on the Search Services team. His interests are designing and building robust, scalable distributed applications. Jeff is a contributor to Open Distro for Elasticsearch.

 

 

 

Kowshik Nagarajaan is a Software Development Engineer at Amazon Web Services where he works on the Search Services team. His interests are building and automating distributed analytics applications. Kowshik is a contributor to Open Distro for Elasticsearch.

 

 

 

Anush Krishnamurthy is an Engineering Manager working on the Search Services team at Amazon Web Services.