nf-core assembly and binning
We will use nf-core/mag (https://nf-co.re/mag) workflow for metagenomics assembly and binning. It starts from reads quality control using fastp, and FastQC. Then:
assigns taxonomy to reads using
Centrifugeand/orKraken2performs assembly using
MEGAHITandSPAdes, and checks their quality usingQuast(optionally) performs ancient DNA assembly validation using
PyDamageand contig consensus sequence recalling withFreebayesandBCFtoolspredicts protein-coding genes for the assemblies using
Prodigal, and bins withProkkaand optionallyMetaEukperforms metagenome binning using
MetaBAT2,MaxBin2, and/or withCONCOCT, and checks the quality of the genome bins usingBusco, orCheckM, and optionallyGUNC.Performs ancient DNA validation and repair with
pyDamageandFreebayesoptionally refines bins with
DasToolassigns taxonomy to bins using
GTDB-Tkand/orCATand optionally identifies viruses in assemblies usinggeNomad, or Eukaryotes withTiara
Before running the workflow, we need to prepare the compressed fastq files as input:
cd /mnt/WGS-data
pigz -k read1.fq
pigz -k read2.fq
Then we can start the mag workflow as follows:
cd ..
mkdir -p output_mag
nextflow run nf-core/mag \
-profile singularity \
--input 'WGS-data/read{1,2}.fq.gz' \
--outdir output_mag \
--skip_concoct \
--skip_metaeuk \
--skip_prokka \
--skip_prodigal \
--skip_spades \
--skip_metabat2 \
--skip_gtdbtk \
--skip_binqc
In the above command we skipped many software due to limited time and resource. After the pipeline finished, we can view the results stored in the output_mag directory.