Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
Beide Seiten der vorigen Revision Vorhergehende Überarbeitung Nächste Überarbeitung | Vorhergehende Überarbeitung | ||
hpc:scheduling:slurm_commands [2024/06/13 18:16] – ↷ Seite von hpc:tutorials:scheduling:slurm_commands nach hpc:scheduling:slurm_commands verschoben | hpc:scheduling:slurm_commands [2024/10/25 15:03] (aktuell) – | ||
---|---|---|---|
Zeile 1: | Zeile 1: | ||
==== Slurm Commands ==== | ==== Slurm Commands ==== | ||
- | === Job starten | + | === Start Jobs === |
+ | |||
+ | **srun** | ||
+ | |||
+ | This is the simplest way to run a job on a cluster. | ||
+ | Initiate parallel job steps within a job or start an interactive job (with --pty). | ||
+ | |||
+ | **salloc** | ||
+ | |||
+ | Request interactive jobs/ | ||
+ | When the job is started a shell (or other program specified on the command line) it is started on the submission host (Frontend). | ||
+ | From this shell you should use srun to interactively start a parallel applications. | ||
+ | The allocation is released when the user exits the shell. | ||
+ | |||
+ | **sbatch** | ||
+ | |||
+ | Submit a batch script. | ||
+ | The script will be executed on the first node of the allocation. | ||
+ | The working directory coincides with the working directory of the sbatch directory. | ||
+ | Within the script one or multiple srun commands can be used to create job steps and execute parallel applications. | ||
+ | |||
+ | **Examples** | ||
< | < | ||
- | sbatch --job-name=$jobname | + | # General: |
- | Eine Startzeit kann mit dem Schalter | + | sbatch --job-name=<name of job shown in squeue> |
- | Beispielsweise: | + | |
+ | # A start date/time can be set via the --begin | ||
--begin=16: | --begin=16: | ||
--begin=now+1hour | --begin=now+1hour | ||
--begin=now+60 (seconds by default) | --begin=now+60 (seconds by default) | ||
--begin=2010-01-20T12: | --begin=2010-01-20T12: | ||
- | Weitere Informationen bietet auch man sbatch | ||
- | Sämtliche dort verwendeten Parameter können auch im Jobscript selber mit #SBATCH angebenen werden. | ||
</ | </ | ||
- | === Status eigener Jobs abfragen === | + | |
+ | A sbatch script of the command above would look like | ||
< | < | ||
- | squeue | + | # |
+ | #SBATCH --job-name=< | ||
+ | #SBATCH -N <num nodes> | ||
+ | #SBATCH --ntasks-per-node=< | ||
+ | #SBATCH --begin=2010-01-20T12: | ||
+ | |||
+ | / | ||
</ | </ | ||
- | === Knotenstatus anzeigen | + | |
+ | For more information see '' | ||
+ | All parameters used there can also be specified in the job script itself using #SBATCH. | ||
+ | |||
+ | Also, more examples can be found [[hpc: | ||
+ | |||
+ | === Check the status of your job submissions === | ||
+ | < | ||
+ | squeue --me | ||
+ | </ | ||
+ | |||
+ | === Check the status of nodes === | ||
< | < | ||
sinfo | sinfo | ||
</ | </ | ||
- | === Jobs löschen | + | |
+ | === Canceling jobs === | ||
+ | On allocation, you will be notified of the job ID. | ||
+ | Also, within your scripts and shells (if allocated via salloc) you can get the ID via the '' | ||
+ | You can use this ID to cancel your submission: | ||
< | < | ||
scancel < | scancel < | ||
</ | </ | ||
+ | === Slurm Cheat Sheet === | ||
+ | |||
+ | A summary of the most common commands can be found [[https:// |