Skip to content

aws-samples/aws-edge-native-nomad

Container in 🤖

🤖

Table of contents

  1. Description
  2. Getting Started
  3. Testing the platform
  4. Architecture
  5. Deep dive at the edge
  6. Troubleshooting
  7. Security
  8. License

Description

The aim of this repository is to deploy a containerized edge native application. Here we focus on architecture:

  • edge device are powerful enough to run container (like Raspberry pi 3 or 4)
  • the application needs to interact with AWS cloud

The proposed architecture relies on three main components:

  • Orchestration infrastructure (Cloud) based on HashiCorp Nomad. HashiCorp Nomad control plane will be deployed on AWS.

  • Private Certificate Authority based on AWS Private CA in order to support the different security requirements

  • Credential Provider based on AWS IAM Roles Anywhere in order to allow the edge native to interact with AWS cloud.

Getting started

This section is divided in three topics.

The first topic is dedicated to the pre-requisite for running the whole section.

The second topic focuses on the deployment of the cloud part of the samples: the nomad control plane with all services related to connectivity and security

The last topic focuses on the edge part of the sample: how to connect your edge device to the cloud.

Warning: Deploying those stacks will create resources which are not of the free tier especially a private CA using AWS Private ACM. The pricing can be found here.

pre-requisite

  • The tutorial should be executed on a local workstation, not an on remote workspace such as Cloud9.
  • It is recommended to deploy resources on Ireland region (eu-west-1)
  • jq
  • open (for MacOS), xdg-open (for Linux)
  • CDK
  • nomad
  • docker
  • Ansible : Ansible is a tool that provides automation for installing application on your infrastructure at scale.
  • the AWS CLI v2
  • python (at least version 3.7)
  • an AWS account
    • AWS credentials with sufficient permission
  • the Account and the region should be bootstrapped following official documentation
  • at least 1 single board computer such as raspberry pi 3 or 4 or NVIDIA jetson.
    • your laptop should be able to access your SBC using IP or host name.
    • you should own a way to connect to your SBC using SSH but without password (like SSH key)
    • openssh-sftp-server should be install
      • In /etc/ssh/sshd_conf add Subsystem sftp internal-sftp and restart sshd

Deploy the cloud infrastructure

Please open a terminal and go at the root of the project: 0. Set up environment variable. Those variables are required so CDK can customize the number of availability zone based on the region it is currently deployed. 1. export CDK_DEFAULT_REGION=<your region> 2. export CDK_DEFAULT_ACCOUNT=<your account>

  1. Modify the Input variables in deployment.json as described below. Especially the authentication settings with your email address and a custom prefix. The email should be valid as you are going to receive a password from Cognito. The prefix should be unique and made of lowercase, alphanumeric character.

  2. Install the required npm packages

     npm install
  3. Synthesis the CDK app (translates CDK resources into an AWS CloudFormation template)

     cdk synth

    You should see the following output:

    Successfully synthesized to container-in-robot/cdk.out
    Supply a stack id (AcmPcaStack, NomadInfrastructureStack, IamAnyStack) to display its template.
    
  4. Before deploying the CDK stack we need to authenticate to the ECR public docker repository since we use some public images in the build.

    aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

    You should see the following output:

     Login Succeeded
    
  5. Deploy the AcmPcaStack. Note if you already have a PKI infrastructure this is optional, and you can make modifications to integrate that with the rest of the infrastructure.

     cdk deploy AcmPcaStack
  6. Deploy the IamAnyStack.

     cdk deploy IamAnyStack
     
  7. Deploy the NomadInfrastructureStack.

     cdk deploy NomadInfrastructureStack
  8. To interact with the cluster using nomad in CLI configuration you will need to initialize the following environment variable.

    source ./scripts/set-nomad-env.sh

Input variables

All the input variables are defined in file deployment.json. Below you can find a detailed explanation:

  • authentication
    • username : the name of the Nomad user, default is admin.
    • email : the email associated with the Nomad user default. It has to be a valid email address because the password will be sent to this address.
    • domain_prefix a unique prefix that will be use host the authentication UI of Cognito. The prefix should follow RFC 952 requirements.
  • cluster
    • datacenter_name : the name of the data center associated with the nomad server, default is control-plane.
    • instance_count : the number of nomad server in the cluster, default is 3.
    • aws_region : the AWS region where to deploy the cluster.
    • discovery_tag_key : the key of the EC2 instance tag used to discover nomad server in a cluster, default is nomad-type.
    • discovery_tag_value : the value of the EC2 instance tag used to discover nomad server in a cluster, default is server.
  • certificates : certificates properties used for mTLS configuration
    • ca_cert_secret_name: the name of the secret storing the certificate of the private CA.
    • server_cert_secret_name the name of the secret storing the certificate of the nomad server.
    • server_private_key_secret_name the name of the secret storing the private key of the nomad server.

Output variables

Output variables are stored in SSM Parameter

Name Description
/infrastructure/nomad/endpoint DNS of the loadbalancer exposing the Nomad cluster
/infrastructure/nomad/token/bootstrap token associated with the bootstrap policy of the nomad cluster
/infrastructure/pca/arn the ARN of the private CA
/infrastructure/profile/arn the ARN of the profile used with IAM Role Anywhere
/infrastructure/role/arn the ARN of the role used with IAM Role Anywhere
/infrastructure/trust_anchor/arn the ARN of the trust anchor used with IAM Role Anywhere

Connect an edge device to the cloud

Please note that currently this provisioning script has only been tested on a Raspberry Pi 4 installed with Raspberry Pi OS Lite (64-bit). If you have a different target you can modify the Ansible scripts to meet your needs.

From the root of this git repository, please run:

  1. Update the file ./inventory/inventory.yml with the SBC ssh connection setting.
  2. Change directory to the nomad-client folder
     cd nomad-client
  3. Create a python virtual environment
     python3 -m venv ./venv
  4. Activate virtual environment:
    source ./venv/bin/activate  
  5. Install dependencies for python script provisioner.py:
    pip install -r requirements.txt
  6. Edit the host names and ssh keys in inventory.yml
  7. Use ping to check the instances are reachable:
    ansible  -i ./inventory/inventory.yml -m ping all
  8. Review and edit the aws_region and profile in playbook.yml to match your AWS credentials. Note that the region must match the one you deployed the CDK stack to since the Ansible tasks will fetch parameters from AWS SSM Parameter Store to configure the nomad agent.
  9. Run the playbook. This will install all the necessary dependencies and provision certificates to the device:
    ansible-playbook -i ./inventory/inventory.yml playbook.yml 

If the playbook completes successfully, you should now see clients connected to your Nomad cluster in the ready state.

Nomad Clients

Testing the platform

Deploying a Nomad Job

The Nomad job defines the schema for Nomad jobs: https://developer.hashicorp.com/nomad/docs/job-specification

Either the Nomad CLI or the Nomad GUI can be used to create jobs

IoT Core publish job

  1. Make sure you have the nomad CLI installed on your machine: https://developer.hashicorp.com/nomad/docs/install
  2. Authenticate the Nomad CLI
    source scripts/set-nomad-env.sh
  3. Build the docker container locally and push to you ECR repository. Substitute your region and account in the command:
    ./scripts/build-and-publish.sh AWS_REGION AWS_ACCOUNT_ID
  4. Run the nomad job:
    nomad job run -check-index 0 scripts/generated/iot-publish.nomad.hcl
  5. Log into the Nomad UI and check the job status and logs Nomad Jobs
  6. Log into the AWS IoT console and use the MQTT test client to view messages on the hello/world topic AWS IoT

Note how we mount the following volumes in the job - this is required for boto3 to use the certificate to generate temporary credentials to make AWS Signature Version 4 requests using temporary credentials.

        volumes = [
            "/root/aws:/root/.aws",
            "/root/certificates:/root/certificates",
            "/usr/bin:/usr/bin"
        ]

Also note that the AWS API requests must be allowed by the iamRolesAnywhereProfileRole defined in iam-any-stack.ts. The IoT publish example works because we include the managed policy AWSIoTDataAccess.

Testing disconnection from control plane

We can simulate a disconnected client which has no internet access by using a firewall on the edge device to block the outbound connections from the edge to the control plane.

  1. Find the public IP address of the instances behind the load balancer
    nslookup ....elb.eu-west-1.amazonaws.com
  2. Configure UFW
    apt install ufw
    sudo ufw allow ssh
    sudo ufw deny out to x.x.x.x
    sudo ufw deny out to y.y.y.y
    sudo ufw deny out to z.z.z.z
    sudo ufw enable
  3. Reboot the device, observe that the job restarts even without access to the control plane

Architecture

Architecture

Orchestration infrastructure (Cloud)

Nomad can be used in three different configurations :

  • server mode: the control plane of the nomad cluster.
  • client mode: the data plane of the nomad cluster or worker nodes. Client are managing containers for example.
  • CLI mode: interact with the cluster.

Whereas the Nomad server will be deployed on the cloud, the client or CLI mode will be deployed at the edge or at the customer premises. A nomad control plane can be seen as a database application with three requirements:

  • nomad server should be deployed as a cluster with an odd number of occurrence between 1 and 9 (1 is not recommended).
  • nomad server should communicate over a LAN network
  • nomad server should be deployed with a persistent file system

HashiCorp Nomad relies on two layers of security: mTLS and ACL.

ACM PCA Stack (Cloud)

AcmPcaStack creates the ACM PCA certificate authority, and also an AWS stepfunction which is run on deployment to issue a server certificate for the Nomad cluster as well as a CLI certificate.

Please review the main stack at lib/acm-pca-stack.ts which contains parameters for the root CA which you may with to change.

AWS Private CA enables creation of private certificate authority (CA) hierarchies, including root and subordinate CAs. We recommend using ACM PCA in production because it ensures the security of the root CA private key. You can issue certificates using an API call rather than retrieving the root CA private key to sign certificates yourself.

Please see pricing here: https://aws.amazon.com/private-ca/pricing/

Full documentation: https://docs.aws.amazon.com/privateca/latest/userguide/PcaWelcome.html

Tools

The script's directory contains a python script generate_cert.py. This is an example of how to use boto3 to issue a certificate from ACM PCA. This script should be integrated into the device provisioning process.

IAM Role's Anywhere Stack

IamAnyStack creates a trust anchor in IAM Role's Anywhere linked to the ACM PCA created in the first stack.

A new feature allowing you to use AWS Identity and Access Management Roles Anywhere to obtain temporary security credentials in IAM for workloads such as servers, containers, and applications that run outside of AWS. This would replace the JITR (Just In Time Registration) workflow in the case you don't wish to use features in AWS IoT.

This feature is well integrated into ACM PCA: https://docs.aws.amazon.com/rolesanywhere/latest/userguide/getting-started.html

In the case you do not wish to only use AWS IoT, it is recommended to consider IAM Role's anywhere as a more generic option. There is the added benefit of the credential provider binary being managed and released by AWS rather than yourselves: https://docs.aws.amazon.com/rolesanywhere/latest/userguide/credential-helper.html

The following shows the AWS configuration file that sets the helper tool as the credential process:

[default]
region=eu-west-1
credential_process = aws_signing_helper credential-process --certificate /root/certificates/device.pem --private-key /root/certificates/device-key.pem --trust-anchor-arn  --profile-arn --role-arn

The full credentials process string to put in the credentials file on the device can be found in the CloudFormation outputs for the IamAnyStack under the CredentialsProcessString key. The template file nomad-client/templates/credentials.j2 takes parameters from the credentials-process clause in the nomad-client/playbook.yml file.

AWS Signing Helper

The aws_signing_helper binary can be downloaded from a link in the doc page here below, however they only include x86 binaries at the time of writing (https://github.com/aws/rolesanywhere-credential-helper/releases/tag/v1.0.4). When deploying or ARM64 devices like the raspberry pi, it's necessary to download the source from GitHub [https://github.com/aws/rolesanywhere-credential-helper] and compile it for the target architecture.

A pre-compiled ARM64 bin is available in this repo: aws_signing_helper

Authentication Flow

alt text

Deep dive at the edge

Description

The structure of the repository aims at following Ansible best practices.

  • The folder inventory and nested files contain description of the infrastructure. The hosts have to be accessible using SSH with password and/or SSH key pair. Here, we made use only of the key pair. Ansible host description contains much parameter than demonstrate here. Please refer to the documentation if you need to adapt them to your infrastructure.
  • The folder roles and nested files define the role and the task that can be executed by Ansible.
  • The file playbook.yml defining the role to deploy and on which inventory

Input variables

The playbook take the following input variable:

  • aws_region:
    • the AWS region where the secrets are stored.
    • to be set in file playbook.yml or to be overridden through CLI.
  • profile:
    • the AWS profile used in order to access the right IAM role or user.
    • to be set in file playbook.yml or to be overridden through CLI.
  • nomad_version:
    • the version of nomad client to deploy on each of the boards
    • to be set in file playbook.yml or to be overridden through CLI.
  • list of machines:
    • the list machine define in yml format (see doc).
    • the file name has to be passed as a parameter of the deployment. An example of inventory is implemented in file ./inventory/inventory.yml

Installed dependencies

The aim of this Ansible playbook is to install a set dependencies for running nomad on the robot. Those Ansible roles have been designed to work only on Debian or Ubuntu based OS.

Software

Artifacts

  • Device certificates and keys in /root/aws/certificates
  • Nomad configuration in /opt/nomad/config
    • definition of the mTLS authentication
    • definition of docker plugin
      • bind allocated port to localhost: ports are attached to 127.0.0.1 instead of the private IP associated with the local network.

Troubleshooting

Fork exec on macOS

If you get this error:

objc[55962]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.

Then run:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Because of this issue: ansible/ansible#76322

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.