Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2947

Investigate using nf-core/rnaseq pipeline

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Nextflow provides a standard type of RNA-Seq data pipeline called "nf-core/rnaseq" that we can potentially use for processing data in the Pollen PGRP and other related projects.

      For example, the following command shows an example of running this pipeline on a "single end" data set, using a reference genome indicated by the "--fasta" option:

      nextflow run nf-core/rnaseq -profile conda --singleEnd --reverseStranded --skipTrimming --reads '*.fastq.gz' --fasta 'ftp://ftp.ensembl.org/pub/release-99/fasta/rattus_norvegicus/dna_index/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz' --gtf 'ftp://ftp.ensembl.org/pub/release-99/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.99.gtf.gz' --fc_count_type transcript
      

      One major benefit of using this pipeline is that there is a lot of support for it in the larger bioinformatics community. There's a company (Sequera) that is supporting this and other Nextflow related projects.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Following directions on off-line use at https://nf-co.re/usage/offline, ran:

            nextflow run nf-core-rnaseq-3.3/workflow -profile test,singularity
            

            Unfortunately, the test profile only works if you have internet access on the executing node. Was able to run it on the head node, however.

            Show
            ann.loraine Ann Loraine added a comment - - edited Following directions on off-line use at https://nf-co.re/usage/offline , ran: nextflow run nf-core-rnaseq-3.3/workflow -profile test,singularity Unfortunately, the test profile only works if you have internet access on the executing node. Was able to run it on the head node, however.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Warning obtained on running rnaseq pipeline:

            WARN: =============================================================================
              Biotype attribute 'gene_biotype' not found in the last column of the GTF file!
            
              Biotype QC will be skipped to circumvent the issue below:
              https://github.com/nf-core/rnaseq/issues/460
            
              Amend '--featurecounts_group_type' to change this behaviour.
            ===================================================================================
            

            used script:

            #!/bin/env bash
            
            SAMPLESHEET=$1
            GENOMEFASTA=$2
            GTF=$3
            GENEBED=$4
            
            MS="Must provide samples, reference genome fasta, annotations GTF, and annotations BED files"
            EX="Example: ./doIt.sh samples.csv S_lycopersicum_Sep_2019.fa S_lycopersium_Sep_2019.gtf S_lycopersium_Sep_2019.bed"
            
            if [[ -z "$SAMPLESHEET" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GENOMEFASTA" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GTF" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GENEBED" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            
            # for running on cluster nodes lacking internet connection, download the pipeline
            # and Singularity containers as directed in https://nf-co.re/usage/offline
            
            nextflow run nf-core-rnaseq-3.3/workflow -resume -profile singularity --fasta $GENOMEFASTA --input $SAMPLESHEET --gtf $GTF --gene_bed $GENEBED
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Warning obtained on running rnaseq pipeline: WARN: ============================================================================= Biotype attribute 'gene_biotype' not found in the last column of the GTF file! Biotype QC will be skipped to circumvent the issue below: https: //github.com/nf-core/rnaseq/issues/460 Amend '--featurecounts_group_type' to change this behaviour. =================================================================================== used script: #!/bin/env bash SAMPLESHEET=$1 GENOMEFASTA=$2 GTF=$3 GENEBED=$4 MS= "Must provide samples, reference genome fasta, annotations GTF, and annotations BED files" EX= "Example: ./doIt.sh samples.csv S_lycopersicum_Sep_2019.fa S_lycopersium_Sep_2019.gtf S_lycopersium_Sep_2019.bed" if [[ -z "$SAMPLESHEET" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GENOMEFASTA" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GTF" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GENEBED" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi # for running on cluster nodes lacking internet connection, download the pipeline # and Singularity containers as directed in https: //nf-co.re/usage/offline nextflow run nf-core-rnaseq-3.3/workflow -resume -profile singularity --fasta $GENOMEFASTA --input $SAMPLESHEET --gtf $GTF --gene_bed $GENEBED
            Hide
            ann.loraine Ann Loraine added a comment -

            Pipeline ran with:

            doIt.sh 1> out 2> err

            Stdout is attached.

            Show
            ann.loraine Ann Loraine added a comment - Pipeline ran with: doIt.sh 1> out 2> err Stdout is attached.
            Hide
            ann.loraine Ann Loraine added a comment -

            Samples file used:

            sample,fastq_1,fastq_2,strandedness
            are-120min-28-34C-R1,are-120min-28-34C-R1_R1_001.fastq.gz,are-120min-28-34C-R1_R2_001.fastq.gz,reverse
            are-120min-28-34C-R2,are-120min-28-34C-R2_R1_001.fastq.gz,are-120min-28-34C-R2_R2_001.fastq.gz,reverse
            are-120min-28-34C-R3,are-120min-28-34C-R3_R1_001.fastq.gz,are-120min-28-34C-R3_R2_001.fastq.gz,reverse
            are-120min-28C-R1,are-120min-28C-R1_R1_001.fastq.gz,are-120min-28C-R1_R2_001.fastq.gz,reverse
            are-120min-28C-R2,are-120min-28C-R2_R1_001.fastq.gz,are-120min-28C-R2_R2_001.fastq.gz,reverse
            are-120min-28C-R3,are-120min-28C-R3_R1_001.fastq.gz,are-120min-28C-R3_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R1,F3H-OX3-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R1_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R2,F3H-OX3-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R2_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R3,F3H-OX3-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R3_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R1,F3H-OX3-120min-28C-R1_R1_001.fastq.gz,F3H-OX3-120min-28C-R1_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R2,F3H-OX3-120min-28C-R2_R1_001.fastq.gz,F3H-OX3-120min-28C-R2_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R3,F3H-OX3-120min-28C-R3_R1_001.fastq.gz,F3H-OX3-120min-28C-R3_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R1,F3H-OX4-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R1_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R2,F3H-OX4-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R2_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R3,F3H-OX4-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R3_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R1,F3H-OX4-120min-28C-R1_R1_001.fastq.gz,F3H-OX4-120min-28C-R1_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R2,F3H-OX4-120min-28C-R2_R1_001.fastq.gz,F3H-OX4-120min-28C-R2_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R3,F3H-OX4-120min-28C-R3_R1_001.fastq.gz,F3H-OX4-120min-28C-R3_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R1,VF36-120min-28-34C-R1_R1_001.fastq.gz,VF36-120min-28-34C-R1_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R2,VF36-120min-28-34C-R2_R1_001.fastq.gz,VF36-120min-28-34C-R2_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R3,VF36-120min-28-34C-R3_R1_001.fastq.gz,VF36-120min-28-34C-R3_R2_001.fastq.gz,reverse
            VF36-120min-28C-R1,VF36-120min-28C-R1_R1_001.fastq.gz,VF36-120min-28C-R1_R2_001.fastq.gz,reverse
            VF36-120min-28C-R2,VF36-120min-28C-R2_R1_001.fastq.gz,VF36-120min-28C-R2_R2_001.fastq.gz,reverse
            VF36-120min-28C-R3,VF36-120min-28C-R3_R1_001.fastq.gz,VF36-120min-28C-R3_R2_001.fastq.gz,reverse
            
            Show
            ann.loraine Ann Loraine added a comment - Samples file used: sample,fastq_1,fastq_2,strandedness are-120min-28-34C-R1,are-120min-28-34C-R1_R1_001.fastq.gz,are-120min-28-34C-R1_R2_001.fastq.gz,reverse are-120min-28-34C-R2,are-120min-28-34C-R2_R1_001.fastq.gz,are-120min-28-34C-R2_R2_001.fastq.gz,reverse are-120min-28-34C-R3,are-120min-28-34C-R3_R1_001.fastq.gz,are-120min-28-34C-R3_R2_001.fastq.gz,reverse are-120min-28C-R1,are-120min-28C-R1_R1_001.fastq.gz,are-120min-28C-R1_R2_001.fastq.gz,reverse are-120min-28C-R2,are-120min-28C-R2_R1_001.fastq.gz,are-120min-28C-R2_R2_001.fastq.gz,reverse are-120min-28C-R3,are-120min-28C-R3_R1_001.fastq.gz,are-120min-28C-R3_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R1,F3H-OX3-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R1_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R2,F3H-OX3-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R2_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R3,F3H-OX3-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R3_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R1,F3H-OX3-120min-28C-R1_R1_001.fastq.gz,F3H-OX3-120min-28C-R1_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R2,F3H-OX3-120min-28C-R2_R1_001.fastq.gz,F3H-OX3-120min-28C-R2_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R3,F3H-OX3-120min-28C-R3_R1_001.fastq.gz,F3H-OX3-120min-28C-R3_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R1,F3H-OX4-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R1_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R2,F3H-OX4-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R2_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R3,F3H-OX4-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R3_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R1,F3H-OX4-120min-28C-R1_R1_001.fastq.gz,F3H-OX4-120min-28C-R1_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R2,F3H-OX4-120min-28C-R2_R1_001.fastq.gz,F3H-OX4-120min-28C-R2_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R3,F3H-OX4-120min-28C-R3_R1_001.fastq.gz,F3H-OX4-120min-28C-R3_R2_001.fastq.gz,reverse VF36-120min-28-34C-R1,VF36-120min-28-34C-R1_R1_001.fastq.gz,VF36-120min-28-34C-R1_R2_001.fastq.gz,reverse VF36-120min-28-34C-R2,VF36-120min-28-34C-R2_R1_001.fastq.gz,VF36-120min-28-34C-R2_R2_001.fastq.gz,reverse VF36-120min-28-34C-R3,VF36-120min-28-34C-R3_R1_001.fastq.gz,VF36-120min-28-34C-R3_R2_001.fastq.gz,reverse VF36-120min-28C-R1,VF36-120min-28C-R1_R1_001.fastq.gz,VF36-120min-28C-R1_R2_001.fastq.gz,reverse VF36-120min-28C-R2,VF36-120min-28C-R2_R1_001.fastq.gz,VF36-120min-28C-R2_R2_001.fastq.gz,reverse VF36-120min-28C-R3,VF36-120min-28C-R3_R1_001.fastq.gz,VF36-120min-28C-R3_R2_001.fastq.gz,reverse
            Hide
            ann.loraine Ann Loraine added a comment -

            Added token and drive information to rclone configuration file on cluster.

            Started copying data:

            [aloraine@str-i1 nfcore_rnaseq]$ targ="brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run"
            [aloraine@str-i1 nfcore_rnaseq]$ echo $targ
            brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run
            [aloraine@str-i1 nfcore_rnaseq]$ rclone lsd "$targ"
            [ ... listed nothing because folder is empty ... ]
            [aloraine@str-i1 nfcore_rnaseq]$ rclone copy results "$targ"
            
            Show
            ann.loraine Ann Loraine added a comment - Added token and drive information to rclone configuration file on cluster. Started copying data: [aloraine@str-i1 nfcore_rnaseq]$ targ= "brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run" [aloraine@str-i1 nfcore_rnaseq]$ echo $targ brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run [aloraine@str-i1 nfcore_rnaseq]$ rclone lsd "$targ" [ ... listed nothing because folder is empty ... ] [aloraine@str-i1 nfcore_rnaseq]$ rclone copy results "$targ"

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: