Tutorials - GPU

Use of GPU node for i-Trop cluster

Description	Know how to use GPU node in I-Trop cluster
Author	Julie ORJUELA (julie.orjuela@ird.fr) and Aurore COMTE (aurore.comte@ird.fr)
Creation date	27/01/2020
modification date	29/01/2024

Summary

Objective
What is a GPU node
Launch jobs in GPU node with Slurm
Resources supervision with nvidia
Liens
License

Objectives

Know how to launch a Slurm job in GPU node in I-Trop Cluster and monitoring jobs in GPU

What is a GPU node?

GPU node contain graphic card used to do différent types of analysis that can provide huge acceleration for certain types of parallel computing tasks (ie: phylogeny construction, image analysis, deeplearning...)

Node GPU in I-trop cluster has 8 graphic cards RTX2080, each with 124G de RAM. In total this node has 24 threads. There is no connection between graphic cards, therefore you can't paralyse on several cards at the same time.

To request a GPU, you need to ask to be add to the gpu_group.
Then the options below should be used in your job submission to ask for a gpu card.

#SBATCH -p gpu
#SBATCH -A gpu_group
#SBATCH --gres=gpu:1

Below an example of GPU usage to basecall a nanopore dataset with guppy:

Basecalling with guppy-gpu using the i-Trop GPU node

Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies’ basecalling algorithms, and several bioinformatic post-processing features.

Basecalling with guppy can be launch using gyppy-gpu tool. In guppy commande you have to specify data containig fast5 raw read files (-i), the output repertory to write fastq files (-o), How many worker threads you are using cpu_threads_per_caller (-c) and the number of parallel basecallers to create (-num_callers). We recommend to compress the fastq output (-compress_fastq).

We recommend to basecaller a data set using a graphic card to obtain results in only one folder. If you split data you can enjoy of the whole of graphic cards but your data results will be in several folders. In each results folder, reads can be share names. So, you can lost information if you decide to merge it.

Creating a slurm scritp to basecalling in GPU

Copy data in node26 /scratch before launching basecalling.

Create a sbatch script to allocate ressources by using slurm. Here, sbatch script lauchGuppyGPU.sbash takes 8 threads for lauch guppy-gpu, partition -p gpu. If you are using i-Trop GPU you are into gpu_group so, give this parametter to slurm whit -A option and also --gres=gpu:1 to specify number of GPU per node.

#!/bin/bash
#SBATCH -J bc 
#SBATCH -p gpu
#SBATCH -A gpu_group
#SBATCH --gres=gpu:1
#SBATCH -c 8

INPUT=$1
OUTPUT=$2
MODEL=$3

#loading modules
module load bioinfo/guppy-gpu/6.3.7

#running basecalling
guppy_basecaller -c ${MODEL} -i ${INPUT} --recursive -s ${OUTPUT} --num_callers 8 --gpu_runners_per_device 8 --device auto --min_qscore 7 --compress_fastq

Now you can launch lauchGuppyGPU.sbash script giving input, output and the model :

$ sbatch lauchGuppyGPU.sbash /path/to/fast5 /path/to/fastq model

Note:
Beside the path of our fast5 files folder (-i), the basecaller requires an output path (-s) and a config file or the flowcell/kit combination. In order to get a list of possible flowcell/kit combinations and config files, we use:

$ guppy_basecaller --print_workflows

Resources supervision with nvidia

$ nvidia-smi

Liens

Cours liés : Slurm Trainings

License

The resource material is licensed under the Creative Commons Attribution 4.0 International License (here).

Tutorials – GPU