Run trimmomatic on HPC system using nextflow and Singularity


      To make our HPC data processing easier and more robust, we are exploring using the nextflow workflow management system in conjunction with Singularity containers.

      For this task, develop a nextflow script that runs "trimmomatic" on all fastq files in a directory.


            Comment [ Retrieving trimmomatic singularity image:

            singularity pull trimmomatic_v0.39.sif oras://registry.forgemia.inra.fr/gafl/singularity/trimmomatic/trimmomatic:latest

            Worked fine on a head node, but failed due to a network error of some time when run from an Andromeda partition interactive session.
            Comment [ Final version of the nextflow script:

            #!/usr/bin/env nextflow

            // test on one sample: nextflow run trim2.nf --dev -with-singularity trimmomatic_v0.39.sif
            // run all: runextflow run trim2.nf --dev -with-singularity trimmomatic_v0.39.sif
            params.dev = false
            params.number_of_inputs = 1
            params.saveMode = 'copy'
            //params.filePattern = "/projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/*_{R1,R2}_001.fastq.gz"
            params.filePattern = "fastq/*_{R1,R2}_001.fastq.gz"
            params.outdir = 'results'

                .fromFilePairs( params.filePattern )
                .ifEmpty { error "Cannot find any reads matching: ${params.filePattern}" }
                .take( params.dev ? params.number_of_inputs : -1 )
                .set { read_pairs_ch }

            process trim {
                time '2h'

                publishDir "$params.outdir", pattern: '*.fq.gz', mode: 'copy'
                tuple val(prefix), file(reads) from read_pairs_ch

                file '*.fq.gz'

                fq_1_paired = prefix + '_R1.p.fq'
                fq_1_unpaired = prefix + '_R1.u.fq'
                fq_2_paired = prefix + '_R2.p.fq'
                fq_2_unpaired = prefix + '_R2.u.fq'
                trimmomatic \
                PE -phred33 \
                ${reads[0]} \
                ${reads[1]} \
                $fq_1_paired \
                $fq_1_unpaired \
                $fq_2_paired \
                $fq_2_unpaired \
                ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50
                gzip $fq_1_paired
                gzip $fq_1_unpaired
                gzip $fq_2_paired
                gzip $fq_2_unpaired

            Outputs copied to a results directory. ]
