
KAAR AI-Powered Kubernetes Cluster Analysis and Remediation
KAAR (Kubernetes AI-powered Analysis and Remediation), a tool that automates Kubernetes Pod issue detection and resolution using k8sgpt and AWS Bedrock.
- ImagePullBackOff: Pods fail to start due to invalid or inaccessible container images.
- CrashLoopBackOff: Containers crash repeatedly from misconfigured commands or errors like “executable file not found.”
- OOMKilled: Pods are terminated for exceeding memory limits.
- Pending: Pods can’t schedule due to resource constraints or node affinity issues.
kubectl describe
and logs
is time-consuming, especially in large clusters.- k8sgpt: A diagnostic tool that scans Kubernetes clusters for Pod issues.
- AWS Bedrock: Uses the Claude v2 model to classify issues and recommend fixes.
- AWS SNS and CloudWatch: Sends notifications and logs results for team visibility.
- kubectl: Applies automated fixes to get Pods back on track.
- Cluster Scanning: KAAR uses k8sgpt to analyze your cluster and identify Pod issues.
- Issue Classification: Findings are processed by AWS Bedrock to determine issue type.
- Remediation: Applies fixes via
kubectl
, like updating images or adjusting limits. - Verification: Ensures Pods are Running and healthy.
- Notification: Logs results to CloudWatch and sends SNS notifications.
nginx-pod
is stuck due to an invalid command. KAAR:- Detects the issue with k8sgpt.
- Uses Bedrock to classify it as CrashLoopBackOff.
- Updates the Pod’s command.
- Verifies it is Running.
- Sends an SNS alert:> “Pod nginx-pod in default is Healthy.”
memory-hog
is terminated for low memory. KAAR:- Identifies the OOMKilled issue.
- Increases the memory limit.
- Confirms stability.
- Sends an alert:> “Pod memory-hog in default is Healthy.”
- Time Savings: Automates remediation.
- Accuracy: AI-driven classification and suggestions.
- AWS Integration: Works with Bedrock, SNS, CloudWatch.
- Future-Ready: Coming support for Services and Deployments.
- Service Remediation: Support for issues like SelectorMismatch and LoadBalancerPending.
- Deployment Support: Fixes for Deployment misconfigurations.
- EventBridge Integration: Scheduled runs for proactive monitoring, inspired by my KAAR project.
- Local LLMs: Support for Ollama as an alternative to Bedrock.