Skip to content

aws-samples/remote-debugging-with-emr

EMR Remote Debugging

A sample walkthrough with accompanying CDK stack that shows how to use a bastion host with PyCharm to debug Spark jobs on EMR.

Overview

This repository includes examples for EMR on EKS and EMR Serverless.

It includes a CDK stack that deploys each environment as well as a centralized bastion machine that is configured to receive remote debugging requests from any of these services.

The bastion machine resides in its own VPC, while all the EMR resources are in a second VPC.

Architecture diagram of solution

Pre-requisites

For ease of getting started, a devcontainer is provided with all the necessary dependencies.

Getting started

Deploy the CDK stack.

Warning

This stack deploys resources you will be billed for, including an EKS cluster and EC2 instances.

cdk deploy --all --require-approval never --context eks_admin_role_name=Admin

eks_admin_role_name is an IAM role that will be granted access to manage your EKS environment.

Once the stack fully deploys, you'll see a variety of outputs that will be useful in future steps.

Update Bastion with SSH key

This could be enabled with CDK, but it's easier to do manually with my setup.

The stack is deployed with SSM so you can remotely manage, but we want to easily login as the ec2-user and port forward.

So using SSM, we'll connect to the instance ID from the DevBox.DevBoxID output and add our key.

INSTANCE_ID=<REPLACE WITH CDK DevBox.DevBoxID VALUE>
cat ~/.ssh/id_ecdsa.pub | pbcopy
aws ssm start-session --region us-west-2 --target ${INSTANCE_ID}

su to the ec2-user then update .ssh/authorized_keys (I just use vi)

sudo su - ec2-user

Now we can remote-forward to our local machine with SSH (as we've already setup AWS-StartSSHSession)

ssh -R '3535:localhost:3535' ec2-user@${INSTANCE_ID}

Remote Debugging

Next, open up the demo_code folder in PyCharm. We'll continue with the README in there.

Conclusion

When you're done, make sure you destroy your stack.

cdk destroy --all

About

A Python-based CDK stack that deploys an EC2 bastion host and EMR Serverless and EMR on EKS environments configured for remote debugging.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks