Skip to content

garutilorenzo/k8s-aws-terraform-cluster

Repository files navigation

GitHub issues GitHub GitHub forks GitHub stars

k8s Logo

Deploy Kubernetes on Amazon AWS

Deploy in a few minutes an high available Kubernetes cluster on Amazon AWS using mixed on-demand and spot instances.

Please note, this is only an example on how to Deploy a Kubernetes cluster. For a production environment you should use EKS or ECS.

The scope of this repo is to show all the AWS components needed to deploy a high available K8s cluster.

Table of Contents

Requirements

  • Terraform - Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
  • Amazon AWS Account - Amazon AWS account with billing enabled
  • kubectl - The Kubernetes command-line tool (optional)
  • aws cli optional

You need also:

  • one VPC with private and public subnets
  • one ssh key already uploaded on your AWS account

For VPC you can refer to this repository.

Before you start

Note that this tutorial uses AWS resources that are outside the AWS free tier, so be careful!

Project setup

Clone this repo and go in the example/ directory:

git clone https://github.com/garutilorenzo/k8s-aws-terraform-cluster
cd k8s-aws-terraform-cluster/example/

Now you have to edit the main.tf file and you have to create the terraform.tfvars file. For more detail see AWS provider setup and Pre flight checklist.

Or if you prefer you can create an new empty directory in your workspace and create this three files:

  • terraform.tfvars
  • main.tf
  • provider.tf

The main.tf file will look like:

variable "AWS_ACCESS_KEY" {

}

variable "AWS_SECRET_KEY" {

}

variable "environment" {
  default = "staging"
}

variable "AWS_REGION" {
  default = "<CHANGE_ME>"
}

variable "my_public_ip_cidr" {
  default = "<CHANGE_ME>"
}

variable "vpc_cidr_block" {
  default = "<CHANGE_ME>"
}

variable "certmanager_email_address" {
  default = "<CHANGE_ME>"
}

variable "ssk_key_pair_name" {
  default = "<CHANGE_ME>"
}

module "private-vpc" {
  region            = var.AWS_REGION
  my_public_ip_cidr = var.my_public_ip_cidr
  vpc_cidr_block    = var.vpc_cidr_block
  environment       = var.environment
  source            = "github.com/garutilorenzo/aws-terraform-examples/private-vpc"
}

output "private_subnets_ids" {
  value = module.private-vpc.private_subnet_ids
}

output "public_subnets_ids" {
  value = module.private-vpc.public_subnet_ids
}

output "vpc_id" {
  value = module.private-vpc.vpc_id
}

module "k8s-cluster" {
  ssk_key_pair_name         = var.ssk_key_pair_name
  environment               = var.environment
  vpc_id                    = module.private-vpc.vpc_id
  vpc_private_subnets       = module.private-vpc.private_subnet_ids
  vpc_public_subnets        = module.private-vpc.public_subnet_ids
  vpc_subnet_cidr           = var.vpc_cidr_block
  my_public_ip_cidr         = var.my_public_ip_cidr
  create_extlb              = true
  install_nginx_ingress     = true
  efs_persistent_storage    = true
  expose_kubeapi            = true
  install_certmanager       = true
  certmanager_email_address = var.certmanager_email_address
  source                    = "github.com/garutilorenzo/k8s-aws-terraform-cluster"
}

output "k8s_dns_name" {
  value = module.k8s-cluster.k8s_dns_name
}

output "k8s_server_private_ips" {
  value = module.k8s-cluster.k8s_server_private_ips
}

output "k8s_workers_private_ips" {
  value = module.k8s-cluster.k8s_workers_private_ips
}

For all the possible variables see Pre flight checklist

The provider.tf will look like:

provider "aws" {
  region     = var.AWS_REGION
  access_key = var.AWS_ACCESS_KEY
  secret_key = var.AWS_SECRET_KEY
}

The terraform.tfvars will look like:

AWS_ACCESS_KEY = "xxxxxxxxxxxxxxxxx"
AWS_SECRET_KEY = "xxxxxxxxxxxxxxxxx"

Now we can init terraform with:

terraform init

Initializing modules...
- k8s-cluster in ..

Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/template...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing hashicorp/aws v4.9.0...
- Installed hashicorp/aws v4.9.0 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

AWS provider setup

Follow the prerequisites step on this link. In your workspace folder or in the examples directory of this repo create a file named terraform.tfvars:

AWS_ACCESS_KEY = "xxxxxxxxxxxxxxxxx"
AWS_SECRET_KEY = "xxxxxxxxxxxxxxxxx"

Pre flight checklist

Once you have created the terraform.tfvars file edit the main.tf file (always in the example/ directory) and set the following variables:

Var Required Desc
region yes set the correct AWS region based on your needs
environment yes Current work environment (Example: staging/dev/prod). This value is used for tag all the deployed resources
ssk_key_pair_name yes Name of the ssh key to use
my_public_ip_cidr yes your public ip in cidr format (Example: 195.102.xxx.xxx/32)
vpc_id yes ID of the VPC to use. You can find your vpc_id in your AWS console (Example: vpc-xxxxx)
vpc_private_subnets yes List of private subnets to use. This subnets are used for the public LB You can find the list of your vpc subnets in your AWS console (Example: subnet-xxxxxx)
vpc_public_subnets yes List of public subnets to use. This subnets are used for the EC2 instances and the private LB. You can find the list of your vpc subnets in your AWS console (Example: subnet-xxxxxx)
vpc_subnet_cidr yes Your subnet CIDR. You can find the VPC subnet CIDR in your AWS console (Example: 172.31.0.0/16)
common_prefix no Prefix used in all resource names/tags. Default: k8s
ec2_associate_public_ip_address no Assign or not a pulic ip to the EC2 instances. Default: false
instance_profile_name no Instance profile name. Default: K8sInstanceProfile
ami no Ami image name. Default: ami-0a2616929f1e63d91, ubuntu 20.04
default_instance_type no Default instance type used by the Launch template. Default: t3.large
instance_types no Array of instances used by the ASG. Dfault: { asg_instance_type_1 = "t3.large", asg_instance_type_3 = "m4.large", asg_instance_type_4 = "t3a.large" }
k8s_version no Kubernetes version to install
k8s_pod_subnet no Kubernetes pod subnet managed by the CNI (Flannel). Default: 10.244.0.0/16
k8s_service_subnet no Kubernetes pod service managed by the CNI (Flannel). Default: 10.96.0.0/12
k8s_dns_domain no Internal kubernetes DNS domain. Default: cluster.local
kube_api_port no Kubernetes api port. Default: 6443
k8s_server_desired_capacity no Desired number of k8s servers. Default 3
k8s_server_min_capacity no Min number of k8s servers: Default 4
k8s_server_max_capacity no Max number of k8s servers: Default 3
k8s_worker_desired_capacity no Desired number of k8s workers. Default 3
k8s_worker_min_capacity no Min number of k8s workers: Default 4
k8s_worker_max_capacity no Max number of k8s workers: Default 3
cluster_name no Kubernetes cluster name. Default: k8s-cluster
install_nginx_ingress no Install or not nginx ingress controller. Default: false
nginx_ingress_release no Nginx ingress release to install. Default: v1.8.1
install_certmanager no Boolean value, install cert manager "Cloud native certificate management". Default: true
certmanager_email_address no Email address used for signing https certificates. Defaul: changeme@example.com
certmanager_release no Cert manager release. Default: v1.12.2
efs_persistent_storage no Deploy EFS for persistent sotrage
efs_csi_driver_release no EFS CSI driver Release: v1.5.8
extlb_listener_http_port no HTTP nodeport where nginx ingress controller will listen. Default: 30080
extlb_listener_https_port no HTTPS nodeport where nginx ingress controller will listen. Default 30443
extlb_http_port no External LB HTTP listen port. Default: 80
extlb_https_port no External LB HTTPS listen port. Default 443
expose_kubeapi no Boolean value, default false. Expose or not the kubeapi server to the internet. Access is granted only from my_public_ip_cidr for security reasons.

Infrastructure overview

The final infrastructure will be made by:

  • two autoscaling group, one for the kubernetes master nodes and one for the worker nodes
  • two launch template, used by the asg
  • one internal load balancer (L4) that will route traffic to Kubernetes servers
  • one external load balancer (L4) that will route traffic to Kubernetes workers
  • one security group that will allow traffic from the VPC subnet CIDR on all the k8s ports (kube api, nginx ingress node port etc)
  • one security group that will allow traffic from all the internet into the public load balancer (L4) on port 80 and 443
  • four secrets that will store k8s join tokens

Optional resources:

  • EFS storage to persist data

Kubernetes setup

The installation of K8s id done by kubeadm. In this installation Containerd is used as CRI and flannel is used as CNI.

You can optionally install Nginx ingress controller.

To install Nginx ingress set the variable install_nginx_ingress to yes (default no).

Nginx ingress controller

You can optionally install Nginx ingress controller To enable the Nginx deployment set install_nginx_ingress variable to true.

The installation is the bare metal installation, the ingress controller then is exposed via a NodePort Service.

---
apiVersion: v1
kind: Service
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  ports:
  - appProtocol: http
    name: http
    port: 80
    protocol: TCP
    targetPort: http
    nodePort: ${extlb_listener_http_port}
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
    nodePort: ${extlb_listener_https_port}
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: NodePort

To get the real ip address of the clients using a public L4 load balancer we need to use the proxy protocol feature of nginx ingress controller:

---
apiVersion: v1
data:
  allow-snippet-annotations: "true"
  enable-real-ip: "true"
  proxy-real-ip-cidr: "0.0.0.0/0"
  proxy-body-size: "20m"
  use-proxy-protocol: "true"
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: ${nginx_ingress_release}
  name: ingress-nginx-controller
  namespace: ingress-nginx

Cert-manager

cert-manager is used to issue certificates from a variety of supported source.

Deploy

We are now ready to deploy our infrastructure. First we ask terraform to plan the execution with:

terraform plan

...
...
Plan: 73 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + k8s_dns_name            = [
      + (known after apply),
    ]
  ~ k8s_server_private_ips  = [
      - [],
      + (known after apply),
    ]
  ~ k8s_workers_private_ips = [
      - [],
      + (known after apply),
    ]
  + private_subnets_ids     = [
      + (known after apply),
      + (known after apply),
      + (known after apply),
    ]
  + public_subnets_ids      = [
      + (known after apply),
      + (known after apply),
      + (known after apply),
    ]
  + vpc_id                  = (known after apply)

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

now we can deploy our resources with:

terraform apply

...

Plan: 73 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + k8s_dns_name            = [
      + (known after apply),
    ]
  ~ k8s_server_private_ips  = [
      - [],
      + (known after apply),
    ]
  ~ k8s_workers_private_ips = [
      - [],
      + (known after apply),
    ]
  + private_subnets_ids     = [
      + (known after apply),
      + (known after apply),
      + (known after apply),
    ]
  + public_subnets_ids      = [
      + (known after apply),
      + (known after apply),
      + (known after apply),
    ]
  + vpc_id                  = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

...
...

Apply complete! Resources: 73 added, 0 changed, 0 destroyed.

Outputs:

k8s_dns_name = "k8s-ext-<REDACTED>.elb.amazonaws.com"
k8s_server_private_ips = [
  tolist([
    "172.x.x.x",
    "172.x.x.x",
    "172.x.x.x",
  ]),
]
k8s_workers_private_ips = [
  tolist([
    "172.x.x.x",
    "172.x.x.x",
    "172.x.x.x",
  ]),
]
private_subnets_ids = [
  "subnet-xxxxxxxxxxxxxxxxx",
  "subnet-xxxxxxxxxxxxxxxxx",
  "subnet-xxxxxxxxxxxxxxxxx",
]
public_subnets_ids = [
  "subnet-xxxxxxxxxxxxxxxxx",
  "subnet-xxxxxxxxxxxxxxxxx",
  "subnet-xxxxxxxxxxxxxxxxx",
]
vpc_id = "vpc-xxxxxxxxxxxxxxxxx"

Now on one master node (connect via AWS SSM) you can check the status of the cluster with:

ubuntu@i-04d089ed896cfafe1:~$ sudo su -

root@i-04d089ed896cfafe1:~# kubectl get nodes
NAME                  STATUS   ROLES                  AGE     VERSION
i-0033b408f7a1d55f3   Ready    control-plane,master   3m33s   v1.23.5
i-0121c2149821379cc   Ready    <none>                 4m16s   v1.23.5
i-04d089ed896cfafe1   Ready    control-plane,master   4m53s   v1.23.5
i-072bf7de2e94e6f2d   Ready    <none>                 4m15s   v1.23.5
i-09b23242f40eabcca   Ready    control-plane,master   3m56s   v1.23.5
i-0cb1e2e7784768b22   Ready    <none>                 3m57s   v1.23.5

root@i-04d089ed896cfafe1:~# kubectl get ns
NAME              STATUS   AGE
cert-manager      Active   85s
default           Active   4m55s
ingress-nginx     Active   87s # <- ingress controller ns
kube-flannel      Active   4m32s
kube-node-lease   Active   4m55s
kube-public       Active   4m56s
kube-system       Active   4m56s

root@i-04d089ed896cfafe1:~# kubectl get pods --all-namespaces
NAMESPACE       NAME                                          READY   STATUS      RESTARTS        AGE
cert-manager    cert-manager-66d9545484-h4d9h                 1/1     Running     0               47s
cert-manager    cert-manager-cainjector-7d8b6bd6fb-zl7sg      1/1     Running     0               47s
cert-manager    cert-manager-webhook-669b96dcfd-b5pgk         1/1     Running     0               47s
ingress-nginx   ingress-nginx-admission-create-g62rk          0/1     Completed   0               50s
ingress-nginx   ingress-nginx-admission-patch-n9tc5           0/1     Completed   0               50s
ingress-nginx   ingress-nginx-controller-5c778bffff-bmk2c     1/1     Running     0               50s
kube-flannel    kube-flannel-ds-5fvx9                         1/1     Running     0               3m45s
kube-flannel    kube-flannel-ds-bvqkc                         1/1     Running     1 (3m13s ago)   3m35s
kube-flannel    kube-flannel-ds-hgxtn                         1/1     Running     1 (111s ago)    2m40s
kube-flannel    kube-flannel-ds-kp6tl                         1/1     Running     0               3m27s
kube-flannel    kube-flannel-ds-nvbbg                         1/1     Running     0               3m55s
kube-flannel    kube-flannel-ds-rhsqq                         1/1     Running     0               2m42s
kube-system     aws-node-termination-handler-478lj            1/1     Running     0               26s
kube-system     aws-node-termination-handler-5bk96            1/1     Running     0               26s
kube-system     aws-node-termination-handler-bkzrf            1/1     Running     0               26s
kube-system     aws-node-termination-handler-cx5ps            1/1     Running     0               26s
kube-system     aws-node-termination-handler-dfr44            1/1     Running     0               26s
kube-system     aws-node-termination-handler-vcq7z            1/1     Running     0               26s
kube-system     coredns-5d78c9869d-n7jcq                      1/1     Running     0               4m1s
kube-system     coredns-5d78c9869d-w9k5j                      1/1     Running     0               4m1s
kube-system     efs-csi-controller-74695cd876-66bw5           3/3     Running     0               28s
kube-system     efs-csi-controller-74695cd876-hl9g7           3/3     Running     0               28s
kube-system     efs-csi-node-7wgff                            3/3     Running     0               27s
kube-system     efs-csi-node-9v4nv                            3/3     Running     0               27s
kube-system     efs-csi-node-mjz2r                            3/3     Running     0               27s
kube-system     efs-csi-node-n4npv                            3/3     Running     0               27s
kube-system     efs-csi-node-pmpnc                            3/3     Running     0               27s
kube-system     efs-csi-node-s4prq                            3/3     Running     0               27s
kube-system     etcd-i-012c258d537d5ec2f                      1/1     Running     0               4m4s
kube-system     etcd-i-018fb1214f9fe07fe                      1/1     Running     0               3m7s
kube-system     etcd-i-0f73570d6dddb6d0b                      1/1     Running     0               3m27s
kube-system     kube-apiserver-i-012c258d537d5ec2f            1/1     Running     0               4m6s
kube-system     kube-apiserver-i-018fb1214f9fe07fe            1/1     Running     1 (3m4s ago)    3m4s
kube-system     kube-apiserver-i-0f73570d6dddb6d0b            1/1     Running     0               3m26s
kube-system     kube-controller-manager-i-012c258d537d5ec2f   1/1     Running     1 (3m15s ago)   4m7s
kube-system     kube-controller-manager-i-018fb1214f9fe07fe   1/1     Running     0               2m9s
kube-system     kube-controller-manager-i-0f73570d6dddb6d0b   1/1     Running     0               3m26s
kube-system     kube-proxy-4lwgv                              1/1     Running     0               2m40s
kube-system     kube-proxy-9hgtr                              1/1     Running     0               3m27s
kube-system     kube-proxy-d6zzp                              1/1     Running     0               4m1s
kube-system     kube-proxy-jwb8x                              1/1     Running     0               3m35s
kube-system     kube-proxy-q2ctc                              1/1     Running     0               2m42s
kube-system     kube-proxy-sgn7r                              1/1     Running     0               3m45s
kube-system     kube-scheduler-i-012c258d537d5ec2f            1/1     Running     1 (3m12s ago)   4m6s
kube-system     kube-scheduler-i-018fb1214f9fe07fe            1/1     Running     0               3m1s
kube-system     kube-scheduler-i-0f73570d6dddb6d0b            1/1     Running     0               3m26s

Public LB check

We can now test the public load balancer, nginx ingress controller and the security group ingress rules. On your local PC run:

curl -k -v https://k8s-ext-<REDACTED>.elb.amazonaws.com/
*   Trying 34.x.x.x:443...
* TCP_NODELAY set
* Connected to k8s-ext-<REDACTED>.elb.amazonaws.com (34.x.x.x) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=IT; ST=Italy; L=Brescia; O=GL Ltd; OU=IT; CN=testlb.domainexample.com; emailAddress=email@you.com
*  start date: Apr 11 08:20:12 2022 GMT
*  expire date: Apr 11 08:20:12 2023 GMT
*  issuer: C=IT; ST=Italy; L=Brescia; O=GL Ltd; OU=IT; CN=testlb.domainexample.com; emailAddress=email@you.com
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55c6560cde10)
> GET / HTTP/2
> Host: k8s-ext-<REDACTED>.elb.amazonaws.com
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 404 
< date: Tue, 12 Apr 2022 10:08:18 GMT
< content-type: text/html
< content-length: 146
< strict-transport-security: max-age=15724800; includeSubDomains
< 
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host k8s-ext-<REDACTED>.elb.amazonaws.com left intact

404 is a correct response since the cluster is empty.

Deploy a sample stack

Deploy ECK on Kubernetes

Clean up

terraform destroy