Easing your migration from SGE to Slurm in AWS ParallelCluster 3

Nick Ihli, Director of Cloud at SchedMD, Austin Cherian, Senior Product Manager-Technical for HPC at AWS.

Young person standing on a street with arrow signs pointing in different directions concept for life choices. In this case, the choice is not life threatening, since they are only choosing between two HPC services.

Selecting a job scheduler is an important decision because of the investment you make in time and effort to use it effectively for your run HPC jobs. Key to any software selection, though, is knowing there’s a robust support framework and track record for innovation.

In June 2020, we announced that we would stop using the Son of Grid Engine (SGE) and Torque job schedulers. We took that decision because the open-source software (OSS) repositories for these two projects had seen no community updates for many years. That makes them higher risk as vectors for attack because “no updates” also means “no patches” for vulnerabilities that are discovered. With every ParallelCluster 2.x release, we worked harder (and harder) to tighten the protective net around these packages to ensure we meet your expectations of AWS in the shared responsibility model. But with ParallelCluster 3, we shifted to only directly supporting schedulers with viable support models. We’ve worked closely with SchedMD, the maintainers and developers of Slurm, to enhance Slurm so it works even better with ParallelCluster.

December 31, 2021 was the last date of support for SGE and Torque in ParallelCluster. The clusters created with these won’t stop working, of course, but to keep your operations going knowing that AWS’s service and support teams are there to help you, it’s time to start your migration to Slurm or AWS Batch. This blog post will help you do that for Slurm.

Two perspectives

To help you understand the details of moving from SGE to Slurm, we’ll present two perspectives: the user and the administrator. Both require gaining familiarity and comfort using the scheduler’s client and job submission commands so you can effectively manage users and cluster resources and run interactive or scripted job submissions. Figure 1 breaks down how the user and administrator roles generally make use of the client and job submission commands.

Table 1. Common questions asked by users and administrators of High-Performance Computing clusters.

Figure 1. Common questions asked by users and administrators of High-Performance Computing clusters.

Migrating from a legacy scheduler like SGE to Slurm is like driving a new car. You may know in theory how it operates, but you’ll need some time to find and understand how to operate all the right controls figure out the radio, and build muscle some memory. To facilitate a hands-on approach to this migration, it’s useful to have side-by-side commands for important functions of the scheduler in each of the areas we outlined in Figure 1.

In this blog, we’ll detail these aspects so you can run your jobs using Slurm. We’ll also discuss methods and show you some tools that make it easy for SGE users to get comfortable with Slurm quickly, including some specialized wrapper commands that will close this gap rapidly.

Client Commands

Since Slurm’s command-line syntax is different, we’ll use figures that show a side-by-side comparison of the SGE and equivalent Slurm commands. Besides learning new commands, you will learn how to find the specific information you need from the command’s output. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation

System Makeup and Info

The first command, sinfo, is one of Slurm’s major commands that gives insight into the node and partition information. The sinfo command output in Figure 2 lists partitions, nodes in each partition, and the state those nodes are in. Partitions are equivalent to queues in SGE. Nodes can exist in multiple partitions. When a node is allocated to a job, its state changes and another line will be displayed, showing the node(s) in the specific state associated with the partition. To see a more specific view on each node, we can run sinfo -N.

Here are some examples of commands and their outputs:

Table 2. Commands to get system information

Figure 2. Commands to get system information

Viewing Jobs

You’ll frequently need to get information about jobs. You can do this using Slurm’s squeue command, (like SGE’s qstat). In a later section, we’ll show you some wrapper scripts for squeue that can be used to obtain job information in SGE’s own output format – to provide compatibility with any other utilities you might have created locally. Figure 3 shows a side-by-side comparison of the two commands and their expected output. For more information on the squeue.

Figure 3. Viewing jobs SGE vs Slurm

Job Submission and Control

In both SGE and Slurm, you can submit jobs interactively or via a job script. In this section, we’ll compare both methods, including how to translate a job script from SGE to Slurm and how to submit it. We’ll also compare using job arrays in Slurm and SGE. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation

Job Submission using a job script

We can submit job scripts in Slurm using the sbatch. In a job script, we provide arguments to the scheduler before specifying commands to execute on the cluster. The arguments are specified with #$ in SGE. Instead of #$, Slurm uses #SBATCH for job requirements. Figure 4 illustrates two job scripts that show the Slurm equivalent #SBATCH options.

Table 4. Job scripts compared between SGE and Slurm formats.

Figure 4. Job scripts compared between SGE and Slurm formats.

After creating a job script, submitting a job is easy: we use qsub <script name> in SGE and sbatch <script name> in Slurm. You can then use squeue, which we described in the previous section to verify your job’s status and monitor its progress.

Environment Variables

Environment variables control different aspects of submitted jobs and can be used in job scripts. Most of Slurm’s environment variables are self-explanatory. Figure 5 shows a side-by-side comparison of SGE and Slurm environment variables.

Figure 5. Commonly used environment variables in SGE and Slurm

Job Arrays

Another type of common job that you will migrate to Slurm is a Job Array. Job Arrays help to increase throughput massively by using the parallelism of your cluster efficiently. For example, Job arrays with millions of tasks can be submitted in milliseconds. Slurm handles an array initially as a single job record. Additional Job records are created only as needed – typically when a task of a job array starts. This drastically increases the scalability of a Slurm environment when managing large job-counts.

Slurm optimizes backfill scheduling when using arrays. Backfill scheduling is typically a heavier operation, as Slurm is looking for those jobs that fit within the backfill window. For job arrays, once an element of a job array is discovered to not be runnable, or affects the scheduling of higher priority pending jobs, the remaining elements of that job array will be quickly skipped. You can find more details on Slurm’s backfill scheduling in the “scheduling configuration guide” for Slurm.

The #SBATCH option for job arrays is denoted with ‘-a’ or ‘--array=’. This is like SGE, with range options, and the option to limit how many tasks in an array can run at once. Figure 6 shows examples comparing job array scripts between SGE and Slurm.

Figure 6. Comparison of Job Array script

Interactive jobs

As an SGE user, you might run interactive jobs sometimes. Slurm also supports interactive job submission. Where qlogin or qrsh are the SGE commands for interactive jobs, Slurm has srun and salloc. The recommended method is to use the parameter LaunchParameters=use_interactive_step in your slurm.conf file, and use salloc to submit the interactive job. salloc will grant an allocation and place the user on the allocated node, ready for interactive commands.

Table 7. Comparison of interactive job command that prints the host name of an allocated compute node

Command Wrappers for Migrating to Slurm

From the administrator’s perspective, helping users learn new commands can be a challenge, leading to longer migration timeframes. To help ease the migration process, SchedMD developed some wrapper translation scripts. The wrappers are not meant as a replacement to using the Slurm commands, but instead as temporary helper scripts.

There are many command wrappers to try, including qalter, qdel, qhold, qrerun, qrls, qstat, and qsub.

Figure 8 walks through an example comparison between Slurm’s squeue and using the qstat wrapper:

Table 8 using the qstat wrapper vs the Slurm squeue command.

Figure 8 using the qstat wrapper vs the Slurm squeue command.

As you can see, the output format more closely resembles SGE’s qstat.

There are some caveats to note, however. Slurm won’t read in #? directives from the job script, but will be looking for #SBATCH options. That’s because it pipes the job script’s content through to sbatch, interpreting the qsub command-line options where necessary, rather than parsing and re-interpreting SGE’s job scripts. The job script will need to be in Slurm format to use this qsub wrapper for Slurm.

A convenient option that can be used with the qsub wrapper is the option --sbatchline. This will output the sbatch command translation but not actually submit the job. This is helpful to users wanting to understand the Slurm equivalents from the SGE submission string. Figure 9 shows an example of how this works:

Table 9: Illustration of using –sbatchline with the qsub wrapper

Figure 9: Illustration of using –sbatchline with the qsub wrapper

Using ‘--sbatchline’, helps you find the corresponding Slurm syntax for your SGE qsub incantation without digging through documentation.

Slurm provides other methods to make it easy for users to submit jobs successfully. For example, a job submit plugin or cli_filter allows administrators to take user’s submissions, make validation checks and force job requirements based on those checks. These are powerful, and it’s best for you to check out Slurm’s documentation directly for job submit plugins and CLI filter plugins.

Conclusion

If you’re an SGE user, migrating to ParallelCluster 3 means change. This is brought about by the end of support for SGE because of lack of community’s interest in maintaining SGE’s open-source codebase. We recommend migrating to ParallelCluster 3 now, and making the switch to either Slurm or AWS Batch.

In this blog we’ve covered the things you need to know to make a move to Slurm, which is the obvious HPC successor to SGE, and the most friction-free path. As we illustrated in this post, the learning curve for switching to Slurm isn’t steep but requires getting used to the new syntax and some nuance.

In this post, we described a detailed migration from SGE to Slurm and provided side-by-side commands to make the migration easy. A hands-on approach to migrating through trying each feature of Slurm is recommended. For more information on Slurm command syntax and additional examples refer to the official Slurm documentation, or watch our 5-part series of HPC Tech Shorts to see the teams from AWS and SchedMD showing them in action.

If you need additional support, SchedMD (the official maintainers of Slurm) offer commercial packages to help you with migrations to Slurm. SchedMD also offer professional services in AWS Marketplace to support your development.

AWS HPC Blog