Metabarcoding using Qiime2 and DADA2

This page describes how to analyse metabarcoding data using Qiime2 and DADA2. From fastq files to SVs table obtention and Phyloseq analysis.

Author(s)

Authors	Marie Simonin and Julie Orjuela
Research Unit	UMR IPME, UMR IPME-DIADE-BOREA
Institut	IRD

Keywords

dada2, silva, vsearch, metabarcoding, 16S, 18S, ITS, Qiime2, denoising, SVs

Files format

fastq, SVs tables, OTU tables

Date

22/03/2019

Practice 2 : Obtaining an OTU table with QIIME : Microbiome denoising and pre-processing

Connect you in ssh mode to bioinfo-master.ird.fr cluster using formation counts.
ssh formationX@bioinfo-master.ird.fr

1. Import raw sequence data (demultiplexed fastQ files) into Qiime2.

https://docs.qiime2.org/2019.1/tutorials/importing/

Option with a manifest file: you need to create and use a manifest file that links the sample names to the fastq files The manifest file is a csv file where the first column is the "sample-id", the second column is the "absolute-filepath" to the fastq.gz file, the third column is the "direction" of the reads (forward or reverse). These are mandatory column names.Here is an example for paired end sequences with Phred scores of 33. !! The csv file must be in the american format: replace ";" by "," as a separator if needed.

Create the manifest file to import the fastq files in qiime2

Go into the folder where are the fastq.gz

echo "sample-id,absolute-filepath,direction" > manifest.csv 
for i in *R1* ; do echo "${i/_R1.fastq.gz},$PWD/$i,forward"; done >> manifest.csv 
for i in *R2* ; do echo "${i/_R2.fastq.gz},$PWD/$i,reverse"; done >> manifest.csv

Load Qiime2 on the server

module load bioinfo/qiime2/2018.11
source activate qiime2-2018.11

Import the fastq files in Qiime2 (stored in Qiime2 as a qza file). qza file is the data format (fastq, txt, fasta) in Qiime2

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.csv \
  --output-path paired-end-demux.qza \
  --input-format PairedEndFastqManifestPhred33

2. Verification of sequence quality and number of sequences per sample.

Visualize the qzv file on qiime tools view: https://view.qiime2.org/. qzv file is the visualization format in Qiime2

qiime demux summarize \
  --i-data paired-end-demux.qza \
  --o-visualization paired-end-demux.qzv

If you are working locally (not on the server), use this function to visualize the qzv file online
```
qiime tools view paired-end-demux.qzv
```

3. Denoising with DADA2

Based on the quality information and presence of primers the different p-trim and p-trunc parameters need to be changed. they are specific to each study and primers. Here we have forward primers of 21 bp and reverse of 20 bp. The total amplicon length is 291 bp, based on the qzv visualization we decide on the truncation length (p-trunc-len) of the forward and reverse reads. You can change the number of threads on the server with p-n-threads. This command will generate 3 files: the OTU table (16S-table.qza), the representative sequence fasta file (16S-rep-seqs.qza) and denoising statistic file (16S-denoising-stats.qza).

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs paired-end-demux.qza \
  --p-trim-left-f 21 \
  --p-trim-left-r 20 \
  --p-trunc-len-f 220 \
  --p-trunc-len-r 160 \
  --p-n-threads 8 \
  --o-table 16S-table.qza \
  --o-representative-sequences 16S-rep-seqs.qza \
  --o-denoising-stats 16S-denoising-stats.qza \
  --verbose

4. Make summary files and visualize the outputs of DADA2.

It necessitates a metadata file with the treatment information (provided). The first column needs to be "sample-id"and the other columns are treatment, site, etc information. Go to Qiime2 View website to visualize the qzv files

qiime metadata tabulate \
  --m-input-file 16S-denoising-stats.qza \
  --o-visualization 16S-denoising-stats.qzv

qiime feature-table summarize \
  --i-table 16S-table.qza \
  --o-visualization 16S-table.qzv \
  --m-sample-metadata-file metadata.txt

qiime feature-table tabulate-seqs \
  --i-data 16S-rep-seqs.qza \
  --o-visualization 16S-rep-seqs.qzv

5. Assign taxonomy to the SVs.

Download pretrained classifier for the V4 region (Silva 132 99% OTUs from 515F/806R region of sequences) based on the SILVA database: https://docs.qiime2.org/2019.1/data-resources/

To create the classifier based on your own parameters (fragment size, region) follow this tutorial, for now we will use the pre-trained classifier for the V4 region (515F-806R) at 99% similarity: https://docs.qiime2.org/2019.1/tutorials/feature-classifier/

qiime feature-classifier classify-sklearn \
  --i-classifier silva-132-99-515-806-nb-classifier.qza \
  --i-reads 16S-rep-seqs.qza \
  --o-classification 16S-rep-seqs-taxonomy.qza

Visualization of the taxonomy output

qiime metadata tabulate \
  --m-input-file 16S-rep-seqs-taxonomy.qza \
  --o-visualization 16S-rep-seqs-taxonomy.qzv

6. Remove SVs in the table that are Chloroplast or Mitochondria (not bacterial or archaeal taxa)

qiime taxa filter-table \
  --i-table 16S-table.qza \
  --i-taxonomy 16S-rep-seqs-taxonomy.qza \
  --p-exclude mitochondria,chloroplast \
  --o-filtered-table 16S-table-noplant.qza

7. Possible filtering/cleaning steps

Rarefy in Qiime2

qiime feature-table rarefy \
  --i-table 16S-table-noplant.qza \
  --p-sampling-depth 10000 \
  --o-rarefied-table 16S-table-noplant-rarefied-10000.qza

Remove SVs that are present only in 1 sample

qiime feature-table filter-features \
  --i-table 16S-table-noplant-rarefied-10000.qza \
  --p-min-samples 2 \
  --o-filtered-table 16S-table-noplant-rarefied-10000_filtered.qza

Filter the the rep-seq.qza to keep only SVs that are present in the final SV table (remove SVs that were Chloroplast, Mitochondria or found in only one sample...)

 qiime feature-table filter-seqs \
  --i-data 16S-rep-seqs.qza \
  --i-table 16S-table-noplant-rarefied-10000_filtered.qza \
  --o-filtered-data 16S-rep-seqs-filtered.qza

Summary after cleaning steps

  qiime feature-table summarize \
  --i-table 16S-table-noplant-rarefied-10000_filtered.qza \
  --o-visualization 16S-table-noplant-rarefied-10000_filtered.qzv \
  --m-sample-metadata-file metadata.txt

8. Export SV table (biom file) and representative sequences (fasta file) for analyses in R studio (structure and diversity analyses) - Qiime2

qiime tools export \
  --input-path 16S-rep-seqs.qza \
  --output-path Dada2-output

qiime tools export \
  --input-path 16S-table-noplant-rarefied-10000_filtered.qza \
  --output-path Dada2-output

9. Make Phylogenetic tree

qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences 16S-rep-seqs-filtered.qza \
  --o-alignment aligned-16S-rep-seqs-filtered.qza \
  --o-masked-alignment masked-aligned-rep-seqs-filtered.qza \
  --o-tree unrooted-16S-tree-filteredSVs.qza \
  --o-rooted-tree rooted-16S-tree-filteredSVs.qza

Export trees

qiime tools export \
  --input-path unrooted-16S-tree-filteredSVs.qza \
  --output-path exported-tree-16S

qiime tools export \
  --input-path  rooted-16S-tree-filteredSVs.qza \
  --output-path exported-tree-16S

Tutorials – Qiime2 in command line