Use of Slurm on the i-Trop cluster
Description | knowing how to use Slurm |
---|---|
Author | Ndomassi TANDO (ndomassi.tando@ird.fr) |
creation date | 08/11/2019 |
Modification Date | 08/03/2021 |
Summary
Objectives
Knowing how to launch different types of jobs with Slurm.
Knowing how to monitor jobs
Launching jobs with Slurm:
Launching commands from the master
The following command allocate computing resources ( nodes, memory, cores) and immediately launch the command on each allocate resource.
$ srun + command
Example:
$ srun hostname
Allow to obtain the name of the computing resource used.
Connect to a node in interactive mode and launch commands:
To connect to a node in interactive mode for X minutes , use the following command :
$ srun -p short --time=X:00 --pty bash -i
Then you can launch on this node without using the srun prefix srun
Connect to a node in interactive mode with x11 support:
The x11 support allows you to launch graphical software within a node.
You first have to connect to the bioinfo-master.ird.fr with the -X option:
$ ssh -X login@bioinfo-master.ird.fr
Then you can launch this command with the --x11
option
$ srun -p short --x11 --pty bash -i
Partitions available :
Depending on the type of jobs you want to launch you have the choice between several partitions
The partitions can be considered as job queues, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc
Priority-ordered jobs are allocated nodes within a partition until the resources (nodes, processors, memory, etc.) within that partition are exhausted.
partition | role | nodes list | Number of course | Ram |
---|---|---|---|---|
short | Jobs < 1 day (high priority, interactive jobs ) | node0,node1,node2,node13,node14 | 12 cores | 48 to 64 GB |
normal | jobs of 7 days max | node0,node1,node2,node13,node14,node15,node16,node17,node18,node19,node20,node22,node23,node24 | 12 to 24 cores | 64 to 96GB |
long | <7 days< long jobs< 45 days | node3,node8,node9,node10,node11,node12 | 12 to 24 cores | 48 GB |
highmem | jobs with memory needs | node4, node7,node17,node21 | 12 to 72 cores | 144 GB and 256GB |
highmemplus | jobs with memory needs | node5 | 88 cores | 512GB |
supermem | jobs with important memory needs | node25 | 40 cores | 1 TB |
gpu | Analyses on GPU cores | node26 | 24 cpus and 8 GPUS cores | 192 GB |
gpu partition access is restricted , a request can be made here: request access to gpu
Main options in Slurm:
srun
or sbatch
can be used with the following options:
actions | Slurm options | SGE options |
---|---|---|
Choose a partition | -p [queue] | -q [queue] |
Number of nodes to use | -N [min[-max]] | N/A |
Number of tasks to launch | -n [count] | -pe [PE] [count] |
Time limit | -t [min] ou -t [days-hh:mm:ss] | -l h_rt=[seconds] |
Precise an output file | -o [file_name] | -o [file_name] |
Precise a error file | -e [file_name] | -e [file_name] |
Combine STDOUT and STDERR files | use -o without -e | -j yes |
Copying the environment | --export=[ALL , NONE , variables] | |
Send an email | --mail-user=[address] | -M [address] |
Notifications to send | --mail-type=[events] | -V |
Job Name | --job-name=[name] | -N [name] |
Relaunch the job | --requeue | -r [yes,no] |
Precise the workdir | --workdir=[dir_name] | -wd [directory] |
Set the memory size to reserve | --mem=[mem][M,G,T] or --mem-per-cpu=[mem][M,G,T] | -l mem_free=[memory][K,M,G] |
Launch with a particular account | --account=[account] | -A [account] |
Number of tasks per node | --tasks-per-node=[count] | (Fixed allocation_rule in PE) |
Number of cpus per task | --cpus-per-task=[count] | N/A |
Dependencie to a job | --depend=[state:job_id] | -hold_jid [job_id , job_name] |
Choose a node | --nodelist=[nodes] ET/OU --exclude=[nodes] | -q [queue]@[node] OR -q |
Job arrays | --array=[array_spec] | -t [array_spec] |
Launching date | --begin=YYYY-MM-DD[THH:MM[:SS]] | -a [YYMMDDhhmm] |
Launching jobs via a script
The batch mode allows to launch an analysis by following the steps described into a script
Slurm allows to use different types of scripts such as bash, perl or python.
Slurm allocates the desired computing resources and launch analyses on these resources in background.
To be interpreted by Slurm, the script should contain a specific header with all the keyword #BATCH
to precise the Slurm options. .
Slurm script example:
#!/bin/bash
## Define a jobname
#SBATCH --job-name=test
## Define an output file
#SBATCH --output=res.txt
## Define the number of tasks
#SBATCH --ntasks=1
## Define the timelimit
#SBATCH --time=10:00
## Define 100Mb of memory per cpu
#SBATCH --mem-per-cpu=100
sleep 180 #launch a 180s sleep
To launch an analysis use the following command:
$ sbatch script.sh
with script.sh
the name of the script to use
Submit a array job
#!/bin/bash
#SBATCH --partition=short ### Partition
#SBATCH --job-name=ArrayJob ### job name
#SBATCH --time=00:10:00 ### timelimit
#SBATCH --nodes=1 ### Number of nodes
#SBATCH --ntasks=1 ### Number of tasks per job array
#SBATCH --array=0-19%4 ### Array index from 0 to 19 with 4 runnings jobs
echo "I am Slurm job ${SLURM_JOB_ID}, array job ${SLURM_ARRAY_JOB_ID}, and array task ${SLURM_ARRAY_TASK_ID}."
You have to use the $SBATCH --array
option to define the range
The ${SLURM_JOB_ID}
variable precise the job id
${SLURM_ARRAY_JOB_ID}
precise the id of the job array
${SLURM_ARRAY_TASK_ID}
precise the number of tasks of the job array.
The script should give a answer like:
$ sbatch array.srun
Submitted batch job 20303
$ cat slurm-20303_1.out
I am Slurm job 20305, array job 20303, and array task 1.
$ cat slurm-20303_19.out
I am Slurm job 20323, array job 20303, and array task 19.
Submit a R job
You can use the same syntax than before for Slurm. You just have to launch your R script with the R script command
Rscript + script.R
#!/bin/bash
## Define the job name
#SBATCH --job-name=test
## Define the output file
#SBATCH --output=res.txt
## Define the number of tasks
#SBATCH --ntasks=1
## Define the execution time limit
#SBATCH --time=10:00
## Define 100Mo of memory per cpu
#SBATCH --mem-per-cpu=100
Rscript script.R #launch the R script script.R
Submit a job with several command in parallel at the same time
Use the options --ntasks
ans --cpus-per-task
Example:
#!/bin/bash
#SBATCH --ntasks=2
#SBATCH --cpu-per-task=2
srun --ntasks=1 sleep 10 &
srun --ntasks=1 sleep 12 &
wait
In this example, we use 2 tasks with 2 cpus allocated per task that is to say 4 cpus allocated for this job
For each task a sleep is launched at the same time.
Notice the use of srun to launch a parallelised command and the &
to launch the command in background.
wait
is needed here to ask the job to wait for the end of each command before stopping
Submit an OpenMP job:
A OpenMP job is a job using several cpus on the same single node. Therefore the number of nodes will always be one.
This will work with a program compiled with openMP
#!/bin/bash
#SBATCH --partition=short ### Partition
#SBATCH --job-name=HelloOMP ### Job Name
#SBATCH --time=00:10:00 ### WallTime
#SBATCH --nodes=1 ### Number of Nodes
#SBATCH --ntasks-per-node=1 ### Number of tasks (MPI processes)
#SBATCH --cpus-per-task=28 ### Number of threads per task (OMP threads)
#SBATCH --account=hpcrcf ### Account used for job submission
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./hello_omp
Environment variables:
SLURM_JOB_ID ID of the allocated job.
SLURM_JOB_NAME Name of the job.
SLURM_JOB_NODELIST List of the nodes allocates to the job .
SLURM_JOB_NUM_NODES Number of nodes allocated to the job.
SLURM_NTASKS Number of tasks in the job.
SLURM_SUBMIT_DIR The directory from which sbatch was invoked.
Cancelling a job
$ scancel <job_id>
With<job_id>
: ID of the job
Monitoring resources:
Get jobs infos:
$ squeue
To refresh the infos every 5 seconds
$ squeue -i 5
Infos on a particular job:
$ scontrol show job <job_id>
With <job_id>
: ID of the job
Infos on the jobs of a particular user
$ squeue -u <user>
With <user>
: login
More infos on jobs:
$ sacct --format=JobID,elapsed,ncpus,ntasks,state,node
Infos on resources used by a finished job
$ seff <job_id>
With<job_id>
: the job ID
You can add the following command at the end of your script to get infos of the jobs in your output file
$ seff $SLURM_JOB_ID
Get infos on partition
$ sinfo
It gives infos on partitions and nodes
More informations:
$ scontrol show partitions
scontrol show
can be used with nodes, user, account etc….
Knowing the time limit for each partition
$sinfo -o "%10P %.11L %.11l"
Get infos on nodes
$ sinfo -N -l
Several states are possible:
-
alloc : The node is fully used
-
mix : The node is partially used
-
idle : No job is runing on the node
-
drain : node is finishing the received jobs but it doesn’t accept new ones ( when the node will be stopped for maintenance)
More informations :
$ scontrol show nodes
Links
- Related courses : HPC Trainings