Slurm Commands
Start Jobs
srun
This is the simplest way to run a job on a cluster. Initiate parallel job steps within a job or start an interactive job (with –pty).
salloc
Request interactive jobs/allocations. When the job is started a shell (or other program specified on the command line) it is started on the submission host (Frontend). From this shell you should use srun to interactively start a parallel applications. The allocation is released when the user exits the shell.
sbatch
Submit a batch script. The script will be executed on the first node of the allocation. The working directory coincides with the working directory of the sbatch directory. Within the script one or multiple srun commands can be used to create job steps and execute parallel applications.
Examples
# General: sbatch --job-name=<name of job shown in squeue> -N <num nodes> --ntasks-per-node=<spawned processes per node> /path/to/sbatch.script.sh # A start date/time can be set via the --begin parameter: --begin=16:00 --begin=now+1hour --begin=now+60 (seconds by default) --begin=2010-01-20T12:34:00
A sbatch script of the command above would look like
#!/bin/bash #SBATCH --job-name=<name of job shown in squeue> -N <num nodes> #SBATCH -N <num nodes> #SBATCH --ntasks-per-node=<spawned processes per node> #SBATCH --begin=2010-01-20T12:34:00 /path/to/sbatch.script.sh
For more information see man sbatch
.
All parameters used there can also be specified in the job script itself using #SBATCH.
Also, more examples can be found here.
Check the status of your job submissions
squeue --me
Check the status of nodes
sinfo
Canceling jobs
On allocation, you will be notified of the job ID.
Also, within your scripts and shells (if allocated via salloc) you can get the ID via the $SLURM_JOBID
environment variable.
You can use this ID to cancel your submission:
scancel <jobid>
Slurm Cheat Sheet
A summary of the most common commands can be found here.