bioinfo@ird.fr

Tutorials – HowTos i-Trop cluster

Howtos cluster i-Trop

Description HowTos for i-Trop Cluster
Author Ndomassi TANDO (ndomassi.tando@ird.fr)
Creation date 08/11/19
Modification date 04/03/21

Summary


Preambule

Architecture of the i-Trop cluster :

The i-Trop computing cluster is made up of a set of computing servers accessible via a front-end machine. Connections to these compute servers are made via this master machine, which ensures the distribution of the different analyses between the machines available at any given moment.

The computing cluster is composed of :

  • 1 master machine
  • 3 nas servers for temporary storage of project data up to 150TB
  • 26 CPU computing nodes with a total capacity of 508 cores and 2744GB of RAM and a GPU server with 8 RTX 2080 graphics cards..

Here is the architecture:


Connecting to a server in ssh from a Windows machine
System Softwares Description url
mobaXterm an advanced terminal for Windows with an X11 server and an SSH client Download
putty Putty allows to connect to a Linux server from a Windows machine . Télécharger
Transfer files from your computer to Linux servers with SFTP
Systems Softwares Description url
filezilla FTP and SFTP client Download
View and edit files locally or on a remote server
Type url
Distant, console mode nano Tutorial
Distant, console mode vi Tutorial
Distant, graphic mode komodo edit Télécharger
Linux & windows éditeur Notepad++ Télécharger

How to : How to:Transfer files with filezilla sftp

Download and install FileZilla
Open FileZilla and save the i-Trop cluster into the site manager

In the FileZilla menu, go to File > Site Manager. Then go through these 5 steps:

  1. Click onNew Site.
  2. Add an explicit name.
  3. 3 choices possible:

    • bioinfo-nas2.ird.fr (nas2) to transfer to /data/project
    • bioinfo-nas.ird.fr (nas) to transfer to /home/user, /data2/projects ou /teams
    • bioinfo-nas3.ird.fr (nas3) to transfer to /data3/project

  1. Put the Logon Type àto"Normal" and type your cluster's credentials
  2. Choose port 22 and press the "Connect" button.
Transferring files

  1. From your computer to the cluster : click and drag an text file item from the left local column to the right remote column
  2. From the cluster to your computer: click and drag an text file item from he right remote column to the left local column

How to : How to: Connect to the i-Trop cluster via ssh

From a windows computer :

with mobaXterm:

  1. Click the session button and choose SSH.
    • In the remote host box, type: bioinfo-master.ird.fr
    • Check the specify username box and enter your login
  2. In the console, enter your password when asked .

From a MAC or Linux:

Open terminal application and type the following command:

ssh login@bioinfo-master.ird.fr

with login: your cluster account

1st connection:

your password has to be changed at first connection

"Mot de passe UNIX (actuel)": you are asked to type the password provided in the account creation email.

Then type your new password twice

The session will be automatically closed..

You will need to open a new session with your new password.



How to : Reserve one or several cores of a node

Cluster uses Slurm (https://slurm.schedmd.com/documentation.html) to deal with users analyses

It monitors available resources (CPU et RAM ) and allocate them to the users for job launching

When you are connected on bioinfo-master.ird.fr, you have the possibily to reserve one or serveral cores among them of the 28 nodes available

Reserving one core

Type the following command:

srun -p short --pty bash -i

You will be randomly connected to one of the nodes of the short partition with one core reserved.

Reserving several cores at the same time

Typê the following command:

srun -p short -c X --pty bash -i

With X the number of cores between 2 and 12.

You will be randomly connected to one of the nodes of the short partition with X reserved cores

Reserving on core of a specific node:

type the following command

srun -p short --nodelist=nodeX --pty bash -i

With nodeX belonging to the short partition


How to : Transfer my data from the nas server to nodes

On the cluster, every node has its own local partition called /scratch.

/scratch is used to receive data to analyse, perform analyses on them and produces data results temporarly.
Data on /scratch is hosted for 30 days max expect for the nodes from the long partition until 45 day
It is mandatory to transfer its data to the /scratch of the reserved node before launching its analyses.

The /scratch volumes range from 1TB to 14TB depending on the chosen node.

When the analyses are finished, consider recovering your data.

The following section tells you how to choose which nas server to transfer data to.

scp command:

To transfer data between 2 remote servers, we use the command scp

scp -r source destination

There are 2 possible syntaxes:

Retrieve data from a remote server:

scp -r remote_server_name:path_to_files/file local_destination

Transfer data to a remote server:

scp -r /local_path_to_files/file remote_server_name:remote_destination

Transfer from or to /home, /data2 or /teams:

The /home, /data2 and /teams scores are located at bioinfo-nas.ird.fr (nas)

Recovering files from nas :

Syntaxes to use

scp -r nas:/home/login/file local_destination

scp -r nas:/data2/project/project_name/file local_destination

scp -r nas:/teams/team_name/file local_destination

Copy files to nas:

Syntax to use:

scp -r /local_path_to_files/file nas:/home/login

scp -r /local_path_to_files/file nas:/data2/project/project_name

scp -r /local_path_to_files/file nas:/teams/team_name

Transfer to or from /data

/data partition is located on bioinfo-nas2.ird.fr (nas2)

Retrieve file from nas2 :

Syntax to use:

scp -r nas2:/data/project/project_name/file local_destination

Copying files to nas2:

Syntax to use:

scp -r /local_path_to_files/file nas2:/data/project/project_name

Transfer from or to /data3 :

/data3 partition is located on bioinfo-nas3.ird.fr (nas3)

Retrieve files from nas3 :

Syntax to use:

scp -r nas3:/data3/project/project_name/file local_destination

Copying files to nas3:

Syntax to use:

scp -r /local_path_to_files/file nas3:/data3/project/project_name


How to : Use module Environnement

Module Environment allows you to dynamically change your environment variable(PATH, LD_LIBRARY_PATH) and then choose your version software.
The nomenclature use for modules is package_name/package_version
Software are divided in 2 groups:

  • bioinfo: list bioinformatics software
  • system: list system softwares

Displaying the available software

module avail

Displaying the description of a sotfware

module whatis module_type/module_name/version

with module_type: bioinfo or system
with module_name: the name of themodule.

For example : samtools version 1.7:

module whatis bioinfo/samtools/1.7

load a software:

module load module_type/module_name/version

with module_type: bioinfo or system
with module_name: module name.

For example : samtools version 1.7:

module load bioinfo/samtools/1.7

unload a software

module unload module_type/module_name/version

with module_type: bioinfo or system
with module_name: module name.

For example : samtools version 1.7:

module unload bioinfo/samtools/1.7

Displaying the loaded modules

module list

Unloading all the modules

module purge


How to : Launch a job with Slurm

Cluster uses Slurmto manage and prioritize users jobs .

It checks the ressources availables (CPU and RAM ) and allocate them to the users to perform their analyses.

Connected to bioinfo-master.ird.fr, we can launch a command with srun or a script withsbatch.

Use srun with a command:

If you simply want to launch a command that will be executed on a node:.

        $ srun + command

Example:

        $ srun hostname

will launch the command hostname on the node choose by Slurm..

Use sbatch to launch a script:

Lthe batch mode allows to launch a analysis in several setps defined in a script.

Slurm allows to use several scripts languages such as bash, perl or python.

Slurm allocates the desired resources and launches the analyses in background.

To be interpreted by Slurm, a script must contain a header with the Slurm options beginning by the keyword #BATCH.

Slurm example script:

#!/bin/bash
## Define the job name
#SBATCH --job-name=test
## Define the output file
#SBATCH --output=res.txt
## Define the number of tasks
#SBATCH --ntasks=1
## Define the execution limit
#SBATCH --time=10:00
## Define 100Mb of memory per cpu
#SBATCH --mem-per-cpu=100
sleep 180 #lance une pause de 180s

to launch a analysis via a script:

$ sbatch script.sh

Withscript.sh the script to use

More Slurm options here: Slurm options

Examples of scripts :

template for a blast script

template for a bwa script


How to: Choose a particular partitionn

Depending on the type of jobs (analyses) you want to run, you can choose between different partitions.

Partitions are analysis queues with specific priorities and constraints such as the size or time limit of a job, the users authorized to use it, etc...

Jobs are prioritized and processed using the resources (CPU and RAM) of the nodes making up these partitions.

partition role nodes list Number of cores Ram
short short jobs < 1 day (high priority,interactive jobs) node0,node1,node2,node13,node14 12 cores 48 à 64 Gb
normal jobs < 7 days node0,node1,node2,node13,node14,node15,node16,node17,node18,node19,node20,node22,node23,node24 12 to 24 cores 64 to 96Gb
long <7 dayss< long jobs< 45 days node3,node8,node9,node10,node11,node12 12 to 24 cores 48 Gb
highmem jobs with memory needs node4, node7,node17,node21 12 to 24 cores 144 Gb
highmemplus jobs with memory needs node5 88 cores 512 Gb
supermem jobs with important memory needs node25 40 coeurs 1 To
gpu analyses on GPU cores node26 24 cpus and 8 GPUS cores 192 Go

The access to the gpu partition is restricted . A request can be made here: : request access to gpu

The partition can be chosen following this scheme:

By default, the chosen partition is the normal partition.

Warning, highmem and highmemplus should only be used for jobs requiring at least 35-40GB of memory..

The supermem partition should be used for large assemblies and jobs requiring more than 100GB of memory.

You can use the htop on a node to visualize the memory consumed by a process.

To choose a partition, use the -p option.

sbatch -p partition
srun -p partition

With partition the chosen partition.


How to : View and delete your data contained in the /scratch partition of the nodes

the 2 scripts are located here/opt/scripts/scratch-scripts/

  • To see your data contained in the /scratch of the nodes:

    sh /opt/scripts/scratch-scripts/scratch_use.sh
    and follow the instructions
  • To delete your data contained in the /scratch partition of the nodes: launch the following command:

    sh /opt/scripts/scratch-scripts/clean_scratch.sh  
    and follow the instructions

How to : Use a singularity container

Singularity is installed on the Itrop Cluster in 3 versions 2.4,3.3.0 and 3.6.0

Containers are located in /data3/projects/containers

The 2.4 folder hosts the containers built with the 2.4 version of singularity

The 3.3.0 folder hosts the containers built with the 3.3.0 version of singularity

You first need to load the environment with the command:

module load system/singularity/2.4 or module load system/singularity/3.3.0

Get help:

Use the command:

singularity help /data3/projects/containers/singularity_version/container.simg

with container.simg the container name .

with singularity_version: 2.4 or 3.3.0

Shell connection to a container:

singularity shell /data3/projects/containers/singularity_version/container.simg

Launch a container with only one application:

singularity run /data3/projects/containers/singularity_version/container.simg + arguments

Launch a container with several applications:

singularity exec /data3/projects/containers/singularity_version/container.simg + tools + arguments

Bind a host folder to a singularity container.

Use the option --bind /host_partition:/container_partition

Example:

singularity exec --bind /toto2:/tmp /data3/projects/containers/singularity_version/container.simg + tools + arguments

The container will have access to the file of the partition /toto2 of the host in its /tmp partition

By default, partitions /home, /opt,/scratch, /data, /data2 and /data3 are already binded.


How to : Cite the Itrop platform in your publications

Please just copy the following sentence:

“The authors acknowledge the ISO 9001 certified IRD i-Trop HPC (member of the South Green Platform)  at IRD montpellier for providing  HPC resources that have contributed to the research results reported within this paper. URL: https://bioinfo.ird.fr/- http://www.southgreen.fr”


Links


License

The resource material is licensed under the Creative Commons Attribution 4.0 International License (here).
i