Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
| Beide Seiten der vorigen Revision Vorhergehende Überarbeitung Nächste Überarbeitung | Vorhergehende Überarbeitung | ||
| hpc:scheduling:slurm_commands [2024/06/13 18:16] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1 | hpc:scheduling:slurm_commands [2024/10/25 15:03] (aktuell) – | ||
|---|---|---|---|
| Zeile 1: | Zeile 1: | ||
| + | ==== Slurm Commands ==== | ||
| + | === Start Jobs === | ||
| + | |||
| + | **srun** | ||
| + | |||
| + | This is the simplest way to run a job on a cluster. | ||
| + | Initiate parallel job steps within a job or start an interactive job (with --pty). | ||
| + | |||
| + | **salloc** | ||
| + | |||
| + | Request interactive jobs/ | ||
| + | When the job is started a shell (or other program specified on the command line) it is started on the submission host (Frontend). | ||
| + | From this shell you should use srun to interactively start a parallel applications. | ||
| + | The allocation is released when the user exits the shell. | ||
| + | |||
| + | **sbatch** | ||
| + | |||
| + | Submit a batch script. | ||
| + | The script will be executed on the first node of the allocation. | ||
| + | The working directory coincides with the working directory of the sbatch directory. | ||
| + | Within the script one or multiple srun commands can be used to create job steps and execute parallel applications. | ||
| + | |||
| + | **Examples** | ||
| + | < | ||
| + | # General: | ||
| + | sbatch --job-name=< | ||
| + | |||
| + | # A start date/time can be set via the --begin parameter: | ||
| + | --begin=16: | ||
| + | --begin=now+1hour | ||
| + | --begin=now+60 (seconds by default) | ||
| + | --begin=2010-01-20T12: | ||
| + | </ | ||
| + | |||
| + | A sbatch script of the command above would look like | ||
| + | < | ||
| + | #!/bin/bash | ||
| + | #SBATCH --job-name=< | ||
| + | #SBATCH -N <num nodes> | ||
| + | #SBATCH --ntasks-per-node=< | ||
| + | #SBATCH --begin=2010-01-20T12: | ||
| + | |||
| + | / | ||
| + | </ | ||
| + | |||
| + | For more information see '' | ||
| + | All parameters used there can also be specified in the job script itself using #SBATCH. | ||
| + | |||
| + | Also, more examples can be found [[hpc: | ||
| + | |||
| + | === Check the status of your job submissions === | ||
| + | < | ||
| + | squeue --me | ||
| + | </ | ||
| + | |||
| + | === Check the status of nodes === | ||
| + | < | ||
| + | sinfo | ||
| + | </ | ||
| + | |||
| + | === Canceling jobs === | ||
| + | On allocation, you will be notified of the job ID. | ||
| + | Also, within your scripts and shells (if allocated via salloc) you can get the ID via the '' | ||
| + | You can use this ID to cancel your submission: | ||
| + | |||
| + | < | ||
| + | scancel < | ||
| + | </ | ||
| + | |||
| + | === Slurm Cheat Sheet === | ||
| + | |||
| + | A summary of the most common commands can be found [[https:// | ||