AWS Open Source Blog

Gathering insights on Kubernetes applications, services, and network traffic with Pixie

We often hear from our Amazon Elastic Kubernetes Service (Amazon EKS) users that adopting an open source observability stack is a top priority for their organizations. That’s why we are excited about Pixie, an Extended Berkeley Packet Filter (eBPF) powered, open source, observability platform for Kubernetes. New Relic is in the process of contributing Pixie to the Cloud Native Computing Foundation (CNCF). We are particularly enthusiastic about the programmability of the Pixie platform, as well as Pixie’s use of eBPF to provide rich, automatic visibility of application events. Pixie stores collected data directly on the users’ Kubernetes cluster.

Pixie makes observability easily accessible to developers. At Amazon Web Services (AWS), we share that vision, to provide every developer access to high-quality observability data with minimal effort. That’s why we’ve decided to partner with New Relic and contribute to the Pixie project. Jaana Dogan, AWS Principal Engineer, will be joining Pixie’s board. AWS is excited to collaborate with New Relic, a worldwide leader in observability, on this open source project.

Get started today

Image of Pixie Web and Mobile Console

What is Pixie?

Pixie is an open source project providing a Kubernetes observability platform designed to help developers debug their production systems with minimal friction, driven by three major technical differentiators.

Auto-instrumentation

When a developer deploys Pixie, within seconds Pixie will automatically collect a variety of rich data sources: networking (HTTP, HTTP2, gRPC, TLS, TCP), database client diagnostics (MySQL, PostgreSQL, Cassandra, Redis), application profiles, and more, which developers can extend programmatically by writing scripts. None of this collection requires any manual instrumentation, with this experience provided “out of the box” through Pixie’s use of eBPF.

eBPF is a kernel technology (starting in Linux 4.x) that enables programs to run in the kernel itself, without having to change kernel source code or add additional kernel modules. Think of it as a lightweight, fully-sandboxed virtual machine (VM) inside the Linux kernel. eBPF programs are event based, and are executed on a specific hook, such as network events, system calls, function entries, and kernel tracepoints. Check out the AWS re:Invent 2019 talk with Brendan Gregg to dive deeper.

Programmatic data access

Every view in Pixie is powered by a PxL script. PxL is Pixie’s Python-based language for querying data, inspired by the popular data tool Pandas. Because all data access in Pixie is programmatic, users can build fully customized views of their systems. PxL scripts work across Pixie’s UI, CLI, and API. Using the Pixie API, users can query Pixie programmatically. Pixie simplifies doing things such as exporting Pixie data to another tool or writing a Slackbot alert.

Kubernetes-native edge compute

Pixie runs entirely inside Kubernetes as a distributed machine data system, meaning you don’t need to transfer any data outside the cluster. Pixie’s architecture gives you a secure, cost-effective, and scalable way to access unlimited data, deploy AI/ML models at source, and set up streaming telemetry pipelines.

The rest of this blog post will show you how to get started with Pixie and, as an example, view slow SQL Queries. Check out the Pixie EKS Workshop to dive deeper.

Pixie in action: Finding slow SQL queries

Install Pixie’s CLI tool using the install script:

$ bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
  • Press Enter to accept the Terms & Conditions.
  • Press Enter to accept the default install path.
  • Visit the provided URL to sign up or sign in for a new Pixie account.
  • Copy and paste the auth token generated in the browser into the CLI.

Deploy Pixie to your EKS Cluster using the px CLI:

$ px deploy --cluster_name <CLUSTER_NAME> --pem_memory_limit=1Gi

Now we navigate to the Pixie Console UI and select our EKS cluster in the drop-down menu.

Shows how to select your Kubernetes cluster in the Pixie console

Then in the script drop-down, we select the px/mysql_data script.

Shows how to change the PxL script in the Pixie console

This script shows us all the MySQL queries originating from our cluster to Amazon Relational Database Service (Amazon RDS), Amazon Aurora, or self-managed MySQL, without adding any MySQL-specific instrumentation in our pod or service.

Highlights Pixie providing insight to SQL Queries originating from a Kubernetes Deployment

Switching the script to px/mysql_stats, we can view key latency stats on our SQL queries.

Highlights Pixie getting latency statistics on SQL Queries originating from a Kubernetes deployment

This is just one of the many use cases Pixie assists SREs, DevOps, and developers with, providing insights to Kubernetes networking (HTTP, HTTP2, gRPC, TLS, TCP), database client diagnostics (MySQL, PostgreSQL, Cassandra, Redis), HTTP events, database events, network statistics, application profiles, and much, more.

Dive deeper

Colin Bookman

Colin Bookman

Colin Bookman is an ISV Sr. Solutions Architect at AWS based in Silicon Valley. He works with AWS ISVs and customers to help them build secure, high-performing, resilient, and cost-efficient infrastructure. He brings years of experience including childhood software and hardware projects, a Bachelor’s of Science in Computer Engineering from Georgia Tech, and past experience architecting and building services that handle millions of QPS. When not working, Colin enjoys taking his two dogs on long walks on the beach.

Mark Carter

Mark Carter

Mark Carter is an entrepreneur, software executive, industry and open source thought leader. In his 25 years career, Mark held senior engineering and product leadership positions at Google, Microsoft, Amazon and PayPal among others. Mark co-founded 3 successful startups and had led transformative projects in Big Data, distributed systems, IoT, machine learning, Cloud, SRE & Observability. His work in security and compliance across Microsoft, Gemalto and most recently as the CISO of Tesla had a profound impact on industry direction. As an open source contributor and team lead, he launched and drove projects including the ISTIO service mesh, the OpenMetrics Prometheus collection standard, Openstack and Grpc. Mark is currently the General Manager managing several services for AWS. He can be reached via Twitter @markcartertm