AWS HPC Blog

Migrating to AWS ParallelCluster v3 – Updated CLI interactions

In our previous post on migrating to AWS ParallelCluster 3, we talk about the high-level functional differences and configuration file changes we’ve made. A key change that we didn’t discuss is that ParallelCluster 3 has an API. That means you can manage and deploy clusters through HTTP endpoints with Amazon API Gateway.

As part of this work, we redesigned the ParallelCluster CLI for compatibility with this new API-centric approach. One of those changes means it now outputs in JSON. You might need to adjust your workflows (and possibly some scripts and documentation) to work with these new commands and options.

This post provides some guidance on mapping between version 2 and version 3 of the ParallelCluster CLI command set, to help you with migrating to ParallelCluster version 3. The post also summarizes new CLI features in ParallelCluster 3 to expose the things you just couldn’t do previously.

First though: you’ll notice the new CLI is different at the outset, starting with the top-level commands. The newer command set is more descriptive and provides more specific meaning on what the command will accomplish. For example, to delete a cluster in ParallelCluster 2 the top-level commands were just pcluster delete whereas the new syntax is pcluster delete-cluster. We’ve added many more commands to accommodate new features in ParallelCluster 3 and to provide more dexterity for control of your clusters. The help CLI option (pcluster -h) will reveal that there are way more commands in the new command-set compared to ParallelCluster 2.

For simplicity, we’ll organize our walkthrough in four areas: (1) CLI commands for cluster creation and overall cluster management; (2) compute fleet behavior; (3) image creation; (4) and logs.

Cluster creation and management

The CLI commands that help to configure, create, connect to, and modify the cluster have not changed significantly. We’ve updated them to fine-tune the syntax reflecting the API-centric approach in ParallelCluster 3, and to expand the scope on the feature-set with more sub-commands. The commands are self-explanatory and you may be familiar with most of them as a ParallelCluster 2 user. Table 1 lists the CLI commands version 2 and compares them to version 3 to help you get an idea of the syntactical updates. We’ve also provided links to the documentation for each command, for rapid reference.

Version 2 Version 3
Table 1: Commands to configure, create, connect to, and modify the cluster
dcv dcv-connect
ssh ssh
version version
configure configure
delete delete-cluster
update update-cluster
create create-cluster
list list-clusters
status describe-cluster

Even though the commands themselves haven’t changed, the feature-set expansion for each of them helps with a wider range of configuration and control for a smoother administrative experience in managing the cluster. For example, with ParallelCluster 2, you may have experienced delays in deploying or updating a cluster due to unmet CloudFormation dependencies resulting in a role back of the stack. In ParallelCluster 3, the create-cluster and update-cluster command now comes with a dryrun sub-command which you can use to validate the dependencies, understand them, and put in fixes on the configuration file before deploying the cluster for real.

You can also use the describe-cluster command to get a link for downloading the clusters YAML configuration file, eliminating the effort of recreating a cluster configuration file from scratch. Figure 1 shows a sample output of the describe-cluster command, providing a link for downloading the configuration file. There are additional useful validation sub-commands to give you more clarity around validation failure levels as well as suppressing validations.

Figure 1: Link to clusters YAML configuration file, outputted by the describe-cluster command.

Figure 1: Link to clusters YAML configuration file, outputted by the describe-cluster command.

Compute Fleet behavior

Along with the API centric approach for the CLI, we’ve also redesigned the commands to provide a more discretized approach for actions that were earlier combined in a singular command.

As a ParallelCluster 2 user, you may have used the stop and start commands to control the compute-fleet behavior, especially while updating the cluster. These commands would enable or disable the compute-fleet along with terminating compute instances when the stop command is used. In ParallelCluster 3, we have introduced a higher resolution of control by replacing these top-level commands with multiple sub commands and introduced the delete-cluster-instances command. See able 2 where we’ve augmented the update-compute-fleet command with a –status sub-command that can start and stop the compute-fleet.

We felt that providing more deliberate control of critical functions like termination of compute-fleet instances helps to better manage the cluster rather than it being a consequential part of another action. Using ParallelCluster 3, you can now terminate all compute-fleet instances while still having the compute-fleet enabled, something that wasn’t possible in ParallelCluster 2 with a single command.

Table 1 lists the CLI commands for compute-fleet management in ParallelCluster version 2 and compares them to version 3 to help you get an idea of the changes. Again, there’s links to the online documentation, for quick reference.

Version 2 Version 3
Table 2: Commands for Compute-fleet management
stop

update-compute-fleet –status STOP_REQUESTED  (slurm specific)

delete-cluster-instances

start update-compute-fleet –status START_REQUESTED (slumr specific)
  describe-cluster-instances
  describe-compute-fleet

Custom AMI creation

ParallelCluster 3 has a new streamlined AMI creation and management process that’s built on top of EC2 Image Builder. We’ve updated the CLI to support this redesign to help with a better experience for ParallelCluster custom AMI creation, and management workflows. The redesign and additional functionality of the workflow allows for a self-contained experience where you only need to have ParallelCluster 3 to accomplish custom image creation tasks.

In keeping with the philosophy of providing a higher resolution of control, there are several new commands added to the repertoire. These commands help with image creation workflows and event logging for better management of custom AMI’s. Table 2 lists the commands for image creation on Version 2 and 3, along with links to the official documentation. These commands are self-explanatory based on their syntax. The corresponding command for the createami command in Version 2 is the build-image on version 3. But you’ll notice that there’s many more commands to help with a wider range of control and management of custom images with ParallelCluster 3. The additional of the extra primitives allows for realizing Image pipelines to fine tune image management for clusters you create.

Version 2 Version 3
Table 3: Commands for image management in ParallelCluster Version 2 vs Version 3.
createami build-image
  list images
  delete-image
describe-image
  list-image-log-streams
  get-image-log-events
  get-image-stack-events
  list-official-images
  export-image-logs

Cluster logs

ParallelCluster 2 supported logging based on Amazon CloudWatch logs where a cluster’s system, scheduler, and node daemon logs all get stored in a CloudWatch log group. These logs help in debugging issues, such as unexpected scaling behavior and node initialization failures. These logs can be queried and analyzed from the CloudWatch console.

In ParallelCluster 3 we’ve expanded the command-set to include functionality that supports working with the log files on your local machine where you’ve installed ParallelCluster 3. These commands help you retrieve and export the logs of for a specific log stream of interest or from AWS CloudFormation for a specific cluster. The table below lists the commands to work with logs for your ParallelCluster 3 deployed HPC cluster.

The commands have a progressive usage pattern which allows for more granular queries of event logs. For Example, you can first get a list of log-streams using the list-cluster-log-streams command, then query the events of a specific stream using the get-cluster-log-events command. To understand more about how to retrieve logs using these commands, have a look at the “Retrieving and preserving logs” section of the official documentation for ParallelCluster troubleshooting.

Table 4: Commands for event logs in ParallelCluster 3.
Version 3 only
list-cluster-log-streams
get-cluster-log-events
get-cluster-stack-events
export-cluster-logs

Conclusion

ParallelCluster 3 has a redesigned CLI for compatibility with its API-centric approach. The new CLI also comes with several updates to provide a richer feature set to create, manage and administer your HPC clusters. In this post we contrasted the CLI command sets between ParallelCluster 2 and ParallelCluster 3, and we hope this helps guide you when migrating from version 2 to version 3. In doing so we also summarized the various CLI commands you can expect to use, some of which are new in ParallelCluster 3.

While this overview serves as in introduction to the CLI in ParallelCluster 3, we recommend having a look at the detailed reference for the ParallelCluster 3 CLI and other new features introduced in the ParallelCluster 3 user guide.

TAGS:
Austin Cherian

Austin Cherian

Austin is a Senior Product Manager-Technical for High Performance Computing at AWS. Previously, he was a Snr Developer Advocate for HPC & Batch, based in Singapore. He's responsible for ensuring AWS ParallelCluster grows to ensure a smooth journey for customers deploying their HPC workloads on AWS. Prior to AWS, Austin was the Head of Intel’s HPC & AI business for India where he led the team that helped customers with a path to High Performance Computing on Intel architectures.

Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.