Collect, aggregate, and analyze Rancher Kubernetes Cluster logs with Amazon CloudWatch

Rancher is a popular open-source container management tool utilized by many organizations that provides an intuitive user interface for managing and deploying the Kubernetes clusters on Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Compute Cloud (Amazon EC2). When Rancher deploys Kubernetes onto nodes in Amazon EC2, it uses Rancher Kubernetes Engine (RKE), which is Rancher’s lightweight Kubernetes installer.

While Rancher eases the creation of Kubernetes clusters, a production ready cluster in Amazon EC2 takes more consideration and planning. Observability on a single platform across applications and infrastructure is critical to operational excellence. Amazon CloudWatch enables centralization of the logs from all of your systems, applications, and AWS services that you use, all in a single, highly-scalable service.

This post will demonstrate how to send logs from your Rancher Kubernetes environment on Amazon EC2 to Amazon CloudWatch Logs. We’ll also explore Amazon CloudWatch Container Insights and Amazon CloudWatch Logs Insights in order to analyze container performance and log data from Kubernetes cluster. This post is for people already running or planning to run the Rancher platform to manage Kubernetes clusters on EC2 and who want to collect, aggregate, and analyze logs with CloudWatch.

Walkthrough

This blog guides you through the configuring and starting of collecting logs in Amazon CloudWatch as well as exploring, analyzing, and visualizing your container logs. It is broken down into the following sections:

Reviewing container log categories
Selecting log processor and forwarder
Enabling and configuring log collection
Searching and analyzing log data in Amazon CloudWatch Logs

Container Log Categories

CloudWatch Logs lets you store, access, monitor, and analyze your log files collected from the Rancher managed Kubernetes cluster running on EC2. These are the log types you can send to CloudWatch Logs:

Log Category	Log Source	Log Description
Application Logs	/var/log/containers	All container-based application logs written to stdout or stderr excluding non-application logs such as kube-proxy and aws-node logs. Kubernetes add-on logs, such as CoreDNS are also included.
Data Plane Logs	/var/log/journal for kubelet.service, kubeproxy.service, and docker.service.	The logs generated by data plane components, which run on every node and are responsible for maintaining running pods, are captured as data plane logs.
Host Logs	/var/log/dmesg, /var/log/secure, /var/log/messages.	Contents of global system and boot-time messages and security related messages including authentication logs.

Having these logs available in CloudWatch Logs is critical for observability and monitoring the Rancher cluster health. It also helps you to troubleshoot and debug cluster issues without needing to log into cluster nodes and analyze these container logs in more standardized and predefined ways.

Selecting Log Processor and Forwarder

In order to collect, unify, and send logs from your Rancher clusters to CloudWatch Logs, utilize Fluent Bit or Fluentd.

Fluent Bit is an open source and multi-platform log processor and forwarder that allows you to collect logs from different sources, unify them, and send them to CloudWatch Logs in an efficient and reliable manner. Due to its lightweight footprint, resource-efficiency with CPU and memory utilization, and significant performance gains, Fluent Bit is now considered the default log solution for Container Insights and recommended log forwarder to CloudWatch Logs. Moreover, the AWS for Fluent Bit Docker image is developed and maintained by AWS, which allows for faster adoption of new features and responses to issues.

Two configuration options are supported for Fluent Bit:

Optimized for Fluent Bit – follows Fluent Bit best practices, focuses on optimizing Fluent Bit to process/stream logs at large scale in a resource efficient way and provides a native Fluent Bit experience. Unless you have specific requirements or dependencies related to Fluentd, we recommend utilizing the optimized configuration option for general purposes implementation.
Compatible with Fluentd – allows maintenance of the Fluentd experience and focuses on minimizing changes required for migration from Fluentd. This configuration option is generally recommended when migrating from an existing Fluentd environment and having dependencies on Fluentd regarding the logging structure or attributes in CloudWatch logs.

Here we will focus on solutions using Fluent Bit optimized configuration. To learn more about Fluent Bit and Fluentd performance, please refer to detailed performance comparison.

The flow of Rancher Kubernetes cluster logs being collected, processed and forwarded to CloudWatch Logs.

Figure 1: Log collection flow for Rancher Kubernetes cluster

Prerequisites

This guide assumes that you already have the following:

An AWS account with proper permissions to create resources for deploying Rancher and Kubernetes on Amazon EC2 and configuring and accessing Amazon CloudWatch.
A Rancher Server deployed on Amazon EC2 with at least a single node Kubernetes cluster attached. For assistance with initial Rancher server deployment on AWS, follow the quick start guide here.
An application deployed on your Kubernetes cluster. See the quick start guide here.

Once the above prerequisites are implemented, you’re ready to enable and configure log collection to CloudWatch.

Enabling and Configuring Log Collection

To get started, open Rancher console, and, if your login landing page is set to Cluster Manager (default), you should see the list of your Rancher managed Kubernetes clusters. If your login landing page is set to Cluster Explorer, then go to Step 2.

Cluster Manager screen within Rancher console has Explorer button for each cluster to open Cluster Explorer dashboard.

Figure 2: Open Cluster Explorer dashboard

Click the Explorer button for the cluster where you want to enable and configure log collection. This will open the Cluster Explorer dashboard for the selected cluster.

First, you will need to create a namespace for CloudWatch, e.g., ‘amazon-cloudwatch’. Clicking the ‘Namespaces’ link located at the left navigation menu under Cluster section should open the Namespaces management screen.

Cluster Explorer dashboard has the navigation menu under Cluster section to open Namespaces management screen.

Figure 3: Open Namespaces management screen

Now, press the Create button located in the top-right corner. This will open the Namespace:Create screen. Create a new namespace by entering the required information and pressing the Create button.

The Namespace screen allows to create a new namespace by entering required information and pressing Create button.

Figure 4: Create a new namespace

ALTERNATIVE OPTION: create a namespace using the command line (CLI) option by running kubectl command.

To access the shell window from Cluster Explorer screen, press >_ at the top menu.

Access to the shell window from Cluster Explorer screen.

Figure 5: Access to the shell window from Cluster Explorer screen

To access the shell window from the Cluster Manager screen, select the drop-down menu at the top-left corner, select your cluster, and then press the >_ Launch kubectl button.

Access to the shell window from Cluster Manager screen.

Figure 6: Access to the shell window from Cluster Manager screen

In the shell window, create or download your Namespace manifest file, e.g., fluent-bit-cluster-info-namespace.yaml and enter kubectl command to apply:

Execution of kubectl command to create Namespace using manifest file.

Figure 7: Execute kubectl command to create Namespace

Feel free to download the sample Namespace manifest file from GitHub.

Next you will create a new ConfigMap in the previously created ‘amazon-cloudwatch’ namespace in order to specify where to send and order for capturing logs, e.g., ‘fluent-bit-cluster-info’. Clicking on the ‘ConfigMaps’ link located in the left navigation menu under Storage section should open the ConfigMaps management screen.

Cluster Explorer dashboard has the navigation menu under Storage section to open ConfigMaps management screen.

Figure 8: Open ConfigMaps management screen

Now, press the Create button located at the top-right corner. This will open the ConfigMap:Create screen. Create a new ConfigMap by providing:

cluster.name – the cluster name;
logs.region – cluster region;
http.server – turn on/off built-in http server for monitoring plugin metrics;
http.port – Fluent Bit http port;
read.head – collect all logs in the file system;
read.tail – only collect logs after Fluent Bit deployment.

The ConfigMap screen allows to create a new ConfigMap by entering required information and pressing Create button.

Figure 9: Create a new ConfigMap

ALTERNATIVE OPTION: create a ConfigMap by using the command line (CLI) option by running the kubectl command.

Open the shell window as specified in Step 2, create or download your ConfigMap manifest file, e.g., fluent-bit-cluster-info-configmap.yaml, and enter the kubectl command to apply:

Execution of kubectl command to create ConfigMap using manifest file.

Figure 10: Execute kubectl command to create ConfigMap

Feel free to download the sample ConfigMap manifest file from GitHub.

Now you are ready to deploy the Fluent Bit daemonset and dependent resources to the cluster and still in the previously created ‘amazon-cloudwatch’ namespace. For demonstration purposes, we will use the sample Fluent Bit daemonset manifest file.

The Fluent Bit data pipeline shows how data flows from log sources to the destination:

The flow from log sources to destination through Fluent Bit data pipeline.

Figure 11: Fluent Bit data pipeline

Learn more details about various configurable parameters defined within the above Fluent Bit configuration ConfigMap here:

global properties service;
inputs: tail and systemd;
parsers: JSON and regex;
filters: AWS, Kubernetes and modify;
outputs: CloudWatch logs.

Open the shell window as specified in Step 2, create or download your Fluent Bit daemonset manifest file, e.g., fluent-bit-cluster-info-daemonset.yaml, and enter the kubectl command to apply:

Download and modification of Fluent Bit daemonset manifest file from shell window. Execution of kubectl command to create ServiceAccount, ClusterRole, ConfigMap and Fluent Bit daemonset using manifest file.

Figure 12: Create Fluent Bit daemonset and dependent resources

Once applied, the manifest will create the following resources:

service account in the ‘amazon-cloudwatch’ namespace used to run the Fluent Bit daemonset;
cluster role in the ‘amazon-cloudwatch’ namespace with get, list and watch permissions on pod logs to service account + cluster role binding;
Fluent Bit configuration ConfigMap defining what logs to capture, how to parse and filter logs as well as desired CloudWatch logs output;
Fluent Bit daemonset.

Go back to the shell window and run kubectl command to validate that each node has one pod named fluent-bit-*, e.g. fluent-bit-xfk2w.

Validation of Fluent Bit pod existence using kubectl command from shell window.

Figure 13: Validate Fluent Bit pod

Now go to the CloudWatch console and validate that following log groups have been created by Fluent Bit:

Validation of Fluent Bit log groups existence from the CloudWatch console.

Figure 14: Validate log groups

At this point you should be able to go into one of these log groups and see if there are any recent events for the log streams indicating that logs are being streamed from Rancher Kubernetes cluster:

Recent events within CloudWatch Log streams indicating that logs are being streamed from Rancher Kubernetes cluster.

Figure 15: Recent events from CloudWatch log streams

Aggregating and forwarding the logs from multiple input streams at large scale and grouping them logically makes it possible to achieve a unified logging and analysis experience for your Rancher Kubernetes clusters on AWS.

Searching and Analyzing Log Data in Amazon CloudWatch Logs

Now that your Rancher Kubernetes logs are available in CloudWatch you can search and filter the log data by creating metric filters on Log Event screen. Metric filters define the terms and patterns to look for in log data.
Let’s say you are looking for error specific to ulimit on your demo app3. You can use the following term as your metric filter pattern: error ulimit.

Search and filtering of log data by creating metric filters on CloudWatch Log Event screen.

Figure 16: Search and filter log event data

To interactively search and analyze your log data including application logs you can use CloudWatch Logs Insights. Logs Insights allows you perform queries and look for the data points, patterns, and trends necessary for understanding how your applications and/or AWS resources are behaving and helps you respond to operational issues as well as identify areas for improvement. With Logs Insights you will be able to query any type of log and also save your queries for re-run.

You can access Logs Insights from the left menu in CloudWatch console or if you in Log Groups screen by selecting your log groups (up to 20) and pressing View in Logs Insights button:

CloudWatch console has the navigation menu under Logs section and View in Logs Insights button to open Logs Insights screen.

Figure 17: Open Logs Insights

Let’s say you suspect that one of your nodes was recently updated. You can run Log Insights Query to see if, where and when the update was executed. Run query for /aws/containerinsights/rancher-aws-demo/host log stream.

fields @timestamp, @message, ec2_instance_id
| filter message like 'COMMAND=yum update' or message like '/bin/yum update'
| sort @timestamp desc

Query example and results from running Log Insights Query to see where and when the update was executed on your nodes.

Figure 18: Log Insights query results – Example 1

The same way you can query /aws/containerinsights/rancher-aws-demo/dataplane log stream to look for Docker issues in specific availability zone:

fields @timestamp, @message, ec2_instance_id
| filter message like 'level=warning' or message like 'level=error'
| filter az="us-east-1f"
| sort @timestamp desc
| limit 10

Query example and results from running Log Insights Query to look for Docker issues in specific availability zone.

Figure 19: Log Insights query results – Example 2

These are just a few examples how using CloudWatch Log Insights query language you can perform queries on your log groups and quickly troubleshoot your Rancher Kubernetes cluster. If an issue occurs, you can use Logs Insights to identify potential causes and validate deployed fixes.

Cleanup

To avoid ongoing charges to your AWS account and if you no longer need, remove the resources you created.

Delete CloudWatch logs groups created as part of this demo.
Use kubectl delete -f command with manifests used earlier in this demo to delete Fluent Bit daemonset resources, ConfigMap and namespace.

Conclusion

In this blog post, we showed you how to collect, aggregate and analyze Rancher Kubernetes logs with CloudWatch Logs and CloudWatch Logs Insights.

Using the Fluent Bit with CloudWatch, you have simple but powerful way to send logs from your containers to CloudWatch Logs and take advantage of CloudWatch integration with other AWS services. Having fully managed CloudWatch Insights service available when you need and with no maintenance required, enhances your visibility into application health and provides better debugging capabilities.

AWS Cloud Operations & Migrations Blog