Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2945

Run trimmomatic on HPC system using nextflow and Singularity

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      To make our HPC data processing easier and more robust, we are exploring using the nextflow workflow management system in conjunction with Singularity containers.

      For this task, develop a nextflow script that runs "trimmomatic" on all fastq files in a directory.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            From RR:

            The command can be found in this script:
            /projects/tomato_genome/scripts/rob/trim-phase2.slurm
            
            
            The command from that script is below:
            
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"   /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/halffile.txt)
            
            module load trimmomatic
            
            java -jar /apps/pkg/trimmomatic/0.39/Trimmomatic-0.39/trimmomatic-0.39.jar \
            PE -summary summary-${file}.txt -validatePairs  /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/${file}_R1_001.fastq.gz \
            /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021//${file}_R2_001.fastq.gz \
             ${file}-R1_paired.fq  ${file}-R1_unpaired.fq ${file}-R2_paired.fq  ${file}-R2_unpaired.fq \
             ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50
            
            
            
            For nextflow, we'd have 2 file inputs. 4 file outputs.
            But I am hard coding the names file too so that should become an input in nextflow ( /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/halffile.txt).
            
            Rob
            
            Show
            ann.loraine Ann Loraine added a comment - From RR: The command can be found in this script: /projects/tomato_genome/scripts/rob/trim-phase2.slurm The command from that script is below: file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/halffile.txt) module load trimmomatic java -jar /apps/pkg/trimmomatic/0.39/Trimmomatic-0.39/trimmomatic-0.39.jar \ PE -summary summary-${file}.txt -validatePairs /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/${file}_R1_001.fastq.gz \ /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021 //${file}_R2_001.fastq.gz \ ${file}-R1_paired.fq ${file}-R1_unpaired.fq ${file}-R2_paired.fq ${file}-R2_unpaired.fq \ ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50 For nextflow, we'd have 2 file inputs. 4 file outputs. But I am hard coding the names file too so that should become an input in nextflow ( /projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/halffile.txt). Rob
            Hide
            ann.loraine Ann Loraine added a comment -
            Show
            ann.loraine Ann Loraine added a comment - Trimmomatic documentation: http://www.usadellab.org/cms/?page=trimmomatic
            Hide
            ann.loraine Ann Loraine added a comment - - edited
            [aloraine@str-i1 nextflow]$ module load singularity
            [aloraine@str-i1 nextflow]$ singularity pull trimmomatic_v0.39.sif oras://registry.forgemia.inra.fr/gafl/singularity/trimmomatic/trimmomatic:latest
            INFO:    Downloading oras image
            [aloraine@str-i1 nextflow]$ singularity run-help ./trimmomatic_v0.39.sif
            Container for trimmomatic
            A flexible read trimming tool for Illumina NGS data
            http://www.usadellab.org/cms/?page=trimmomatic
            
            Version: 0.39
            Package installation using Miniconda3-4.7.12
            All packages are in /opt/miniconda/bin & are in PATH
            Default runscript: trimmomatic
            
            Usage:
                trimmomatic_v0.39.sif --help
                or:
                singularity exec trimmomatic_v0.39.sif trimmomatic --help
            [aloraine@str-i1 nextflow]$ singularity exec trimmomatic_v0.39.sif trimmomatic --help
            INFO:    Converting SIF file to temporary sandbox...
            Usage: 
                   PE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] [-validatePairs] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>...
               or: 
                   SE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] <inputFile> <outputFile> <trimmer1>...
               or: 
                   -version
            INFO:    Cleaning up image...
            

            Above worked fine on a head node, but failed due to a network error of some time when run from an Andromeda partition interactive session.

            Show
            ann.loraine Ann Loraine added a comment - - edited [aloraine@str-i1 nextflow]$ module load singularity [aloraine@str-i1 nextflow]$ singularity pull trimmomatic_v0.39.sif oras: //registry.forgemia.inra.fr/gafl/singularity/trimmomatic/trimmomatic:latest INFO: Downloading oras image [aloraine@str-i1 nextflow]$ singularity run-help ./trimmomatic_v0.39.sif Container for trimmomatic A flexible read trimming tool for Illumina NGS data http: //www.usadellab.org/cms/?page=trimmomatic Version: 0.39 Package installation using Miniconda3-4.7.12 All packages are in /opt/miniconda/bin & are in PATH Default runscript: trimmomatic Usage: trimmomatic_v0.39.sif --help or: singularity exec trimmomatic_v0.39.sif trimmomatic --help [aloraine@str-i1 nextflow]$ singularity exec trimmomatic_v0.39.sif trimmomatic --help INFO: Converting SIF file to temporary sandbox... Usage: PE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] [-validatePairs] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>... or: SE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] <inputFile> <outputFile> <trimmer1>... or: -version INFO: Cleaning up image... Above worked fine on a head node, but failed due to a network error of some time when run from an Andromeda partition interactive session.
            Hide
            ann.loraine Ann Loraine added a comment -

            Asked about error pulling Singularity container onto a cluster node. Reply:

            "Compute nodes do not have internet access, so Singularity pulls can only occur on the head nodes. So in this case, you'll want to pull the image on the head node into your account so that it's available on the compute nodes."

            Show
            ann.loraine Ann Loraine added a comment - Asked about error pulling Singularity container onto a cluster node. Reply: "Compute nodes do not have internet access, so Singularity pulls can only occur on the head nodes. So in this case, you'll want to pull the image on the head node into your account so that it's available on the compute nodes."
            Hide
            ann.loraine Ann Loraine added a comment -

            Running with singularity container on a cluster node;

            nextflow run trim.nf -with-singularity trimmomatic_v0.39.sif 
            

            The first time I did this, I got an error about the work directory not being found, possibly because the container had not "mounting" the local file system. Adding the following line to "nextflow.config" fixed the problem:

            singularity.autoMounts = true
            

            Can I add this configuration to the nextflow script itself?

            Show
            ann.loraine Ann Loraine added a comment - Running with singularity container on a cluster node; nextflow run trim.nf -with-singularity trimmomatic_v0.39.sif The first time I did this, I got an error about the work directory not being found, possibly because the container had not "mounting" the local file system. Adding the following line to "nextflow.config" fixed the problem: singularity.autoMounts = true Can I add this configuration to the nextflow script itself?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Running:

            #!/usr/bin/env nextflow
            
            params.saveMode = 'copy'
            params.filePattern = "/projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/*_{R1,R2}_001.fastq.gz"
            params.resultsDir = 'results/trimmomatic'
            
            Channel
                .fromFilePairs( params.filePattern )
                .ifEmpty { error "Cannot find any reads matching: ${params.filePattern}" }
                .set { read_pairs_ch }
                
            process trim {
                time '2h'
                publishDir params.resultsDir, mode: params.saveMode
                
                input:
                tuple val(prefix), file(reads) from read_pairs_ch
            
                output:
                stdout result
            
                script:
            
                fq_1_paired = prefix + '_R1.paired.fastq'
                fq_1_unpaired = prefix + '_R1.unpaired.fastq'
                fq_2_paired = prefix + '_R2.paired.fastq'
                fq_2_unpaired = prefix + '_R2.unpaired.fastq'
            	  
                """
                trimmomatic \
                PE -phred33 \
                ${reads[0]} \
                ${reads[1]} \
                $fq_1_paired \
                $fq_1_unpaired \
                $fq_2_paired \
                $fq_2_unpaired \
                ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50
                printf 'processed sample $prefix with trimmomatic version '
                trimmomatic -version
                """
            }
            
            result.view()
            

            So far it appears to be working OK. But even though a few jobs have finished, there's no "results" directory visible. Not sure why not.

            Show
            ann.loraine Ann Loraine added a comment - - edited Running: #!/usr/bin/env nextflow params.saveMode = 'copy' params.filePattern = "/projects/tomato_genome/rnaseq/phase2-rnaseq-Sep2021/*_{R1,R2}_001.fastq.gz" params.resultsDir = 'results/trimmomatic' Channel .fromFilePairs( params.filePattern ) .ifEmpty { error "Cannot find any reads matching: ${params.filePattern}" } .set { read_pairs_ch } process trim { time '2h' publishDir params.resultsDir, mode: params.saveMode input: tuple val(prefix), file(reads) from read_pairs_ch output: stdout result script: fq_1_paired = prefix + '_R1.paired.fastq' fq_1_unpaired = prefix + '_R1.unpaired.fastq' fq_2_paired = prefix + '_R2.paired.fastq' fq_2_unpaired = prefix + '_R2.unpaired.fastq' """ trimmomatic \ PE -phred33 \ ${reads[0]} \ ${reads[1]} \ $fq_1_paired \ $fq_1_unpaired \ $fq_2_paired \ $fq_2_unpaired \ ILLUMINACLIP:TruSeq2-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50 printf 'processed sample $prefix with trimmomatic version ' trimmomatic -version """ } result.view() So far it appears to be working OK. But even though a few jobs have finished, there's no "results" directory visible. Not sure why not.
            Hide
            ann.loraine Ann Loraine added a comment -

            Answer to above: Files created need to be tracked in the output

            Show
            ann.loraine Ann Loraine added a comment - Answer to above: Files created need to be tracked in the output
            Show
            ann.loraine Ann Loraine added a comment - This was useful: https://gencore.bio.nyu.edu/three-useful-nextflow-patterns-every-computational-biologist-should-know/
            Hide
            ann.loraine Ann Loraine added a comment -

            Final version of script read to be merged into team repository:

            https://bitbucket.org/hotpollen/rna-seq/src/master/src/trimmomatic.nf

            Show
            ann.loraine Ann Loraine added a comment - Final version of script read to be merged into team repository: https://bitbucket.org/hotpollen/rna-seq/src/master/src/trimmomatic.nf

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: