nf-core assembly and binning

We will use nf-core/mag (https://nf-co.re/mag) workflow for metagenomics assembly and binning. It starts from reads quality control using fastp, and FastQC. Then:

assigns taxonomy to reads using Centrifuge and/or Kraken2
performs assembly using MEGAHIT and SPAdes, and checks their quality using Quast
(optionally) performs ancient DNA assembly validation using PyDamage and contig consensus sequence recalling with Freebayes and BCFtools
predicts protein-coding genes for the assemblies using Prodigal, and bins with Prokka and optionally MetaEuk
performs metagenome binning using MetaBAT2, MaxBin2, and/or with CONCOCT, and checks the quality of the genome bins using Busco, or CheckM, and optionally GUNC.
Performs ancient DNA validation and repair with pyDamage and Freebayes
optionally refines bins with DasTool
assigns taxonomy to bins using GTDB-Tk and/or CAT and optionally identifies viruses in assemblies using geNomad, or Eukaryotes with Tiara

Before running the workflow, we need to prepare the compressed fastq files as input:

cd /mnt/WGS-data
pigz -k read1.fq
pigz -k read2.fq

Then we can start the mag workflow as follows:

cd ..
mkdir -p output_mag
nextflow run nf-core/mag \
  -profile singularity \
  --input 'WGS-data/read{1,2}.fq.gz' \
  --outdir output_mag \
  --skip_concoct \
  --skip_metaeuk \
  --skip_prokka \
  --skip_prodigal \
  --skip_spades \
  --skip_metabat2 \
  --skip_gtdbtk \
  --skip_binqc

In the above command we skipped many software due to limited time and resource. After the pipeline finished, we can view the results stored in the output_mag directory.