Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
hpc:scheduling:slurm_commands [2024/06/13 18:16] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1hpc:scheduling:slurm_commands [2024/10/25 15:03] (aktuell)
Zeile 1: Zeile 1:
 +==== Slurm Commands ====
  
 +=== Start Jobs ===
 +
 +**srun**
 +
 +This is the simplest way to run a job on a cluster.
 +Initiate parallel job steps within a job or start an interactive job (with --pty).
 +
 +**salloc**
 +
 +Request interactive jobs/allocations.
 +When the job is started a shell (or other program specified on the command line) it is started on the submission host (Frontend).
 +From this shell you should use srun to interactively start a parallel applications.
 +The allocation is released when the user exits the shell.
 +
 +**sbatch**
 +
 +Submit a batch script.
 +The script will be executed on the first node of the allocation.
 +The working directory coincides with the working directory of the sbatch directory.
 +Within the script one or multiple srun commands can be used to create job steps and execute parallel applications.
 +
 +**Examples**
 +<code>
 +# General:
 +sbatch --job-name=<name of job shown in squeue> -N <num nodes> --ntasks-per-node=<spawned processes per node> /path/to/sbatch.script.sh
 +
 +# A start date/time can be set via the --begin parameter:
 +--begin=16:00
 +--begin=now+1hour
 +--begin=now+60 (seconds by default)
 +--begin=2010-01-20T12:34:00
 +</code>
 +
 +A sbatch script of the command above would look like
 +<code>
 +#!/bin/bash
 +#SBATCH --job-name=<name of job shown in squeue> -N <num nodes>
 +#SBATCH -N <num nodes>
 +#SBATCH --ntasks-per-node=<spawned processes per node>
 +#SBATCH --begin=2010-01-20T12:34:00
 +
 +/path/to/sbatch.script.sh
 +</code>
 +
 +For more information see ''man sbatch''.
 +All parameters used there can also be specified in the job script itself using #SBATCH.
 +
 +Also, more examples can be found [[hpc:tutorials:sbatch_examples|here]].
 +
 +=== Check the status of your job submissions ===
 +<code>
 +squeue --me
 +</code>
 +
 +=== Check the status of nodes ===
 +<code>
 +sinfo
 +</code>
 +
 +=== Canceling jobs ===
 +On allocation, you will be notified of the job ID. 
 +Also, within your scripts and shells (if allocated via salloc) you can get the ID via the ''$SLURM_JOBID'' environment variable.
 +You can use this ID to cancel your submission:
 +
 +<code>
 +scancel <jobid>
 +</code>
 +
 +=== Slurm Cheat Sheet ===
 +
 +A summary of the most common commands can be found [[https://slurm.schedmd.com/pdfs/summary.pdf|here]].