Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2947

Investigate using nf-core/rnaseq pipeline

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Nextflow provides a standard type of RNA-Seq data pipeline called "nf-core/rnaseq" that we can potentially use for processing data in the Pollen PGRP and other related projects.

      For example, the following command shows an example of running this pipeline on a "single end" data set, using a reference genome indicated by the "--fasta" option:

      nextflow run nf-core/rnaseq -profile conda --singleEnd --reverseStranded --skipTrimming --reads '*.fastq.gz' --fasta 'ftp://ftp.ensembl.org/pub/release-99/fasta/rattus_norvegicus/dna_index/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz' --gtf 'ftp://ftp.ensembl.org/pub/release-99/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.99.gtf.gz' --fc_count_type transcript
      

      One major benefit of using this pipeline is that there is a lot of support for it in the larger bioinformatics community. There's a company (Sequera) that is supporting this and other Nextflow related projects.

        Attachments

          Issue Links

            Activity

            Show
            ann.loraine Ann Loraine added a comment - See: https://app.slack.com/client/TE6CZUZPH/CE8SSJV3N
            Hide
            ann.loraine Ann Loraine added a comment -

            Dealing with lack of internet access on nodes:
            https://nf-co.re/tools/#downloading-pipelines-for-offline-use

            Show
            ann.loraine Ann Loraine added a comment - Dealing with lack of internet access on nodes: https://nf-co.re/tools/#downloading-pipelines-for-offline-use
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Ran the test code on the head node, but it failed on a cluster node because the cluster node can't access the internet. This means I have to install nf-core tools and download all the things in advance.

            Test command:

            nextflow run nf-core/rnaseq -profile test,singularity
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Ran the test code on the head node, but it failed on a cluster node because the cluster node can't access the internet. This means I have to install nf-core tools and download all the things in advance. Test command: nextflow run nf-core/rnaseq -profile test,singularity
            Hide
            ann.loraine Ann Loraine added a comment -

            Requested nf-core to be installed on the HPC system. Once this is installed, it will be easier to manage import of digital artifacts needed for running jobs.

            Show
            ann.loraine Ann Loraine added a comment - Requested nf-core to be installed on the HPC system. Once this is installed, it will be easier to manage import of digital artifacts needed for running jobs.
            Hide
            ann.loraine Ann Loraine added a comment -

            nf-core was installed. I used it to download all singularity images, configs, and code needed to run the rnaseq nf-core pipeline to: /nobackup/tomato_genome/nfcore with:

            nf-core download rnaseq
            

            after running:

            module load nf-core
            

            which launched a virtual environment with nf-core commands and python configured.

            Show
            ann.loraine Ann Loraine added a comment - nf-core was installed. I used it to download all singularity images, configs, and code needed to run the rnaseq nf-core pipeline to: /nobackup/tomato_genome/nfcore with: nf-core download rnaseq after running: module load nf-core which launched a virtual environment with nf-core commands and python configured.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Following directions on off-line use at https://nf-co.re/usage/offline, ran:

            nextflow run nf-core-rnaseq-3.3/workflow -profile test,singularity
            

            Unfortunately, the test profile only works if you have internet access on the executing node. Was able to run it on the head node, however.

            Show
            ann.loraine Ann Loraine added a comment - - edited Following directions on off-line use at https://nf-co.re/usage/offline , ran: nextflow run nf-core-rnaseq-3.3/workflow -profile test,singularity Unfortunately, the test profile only works if you have internet access on the executing node. Was able to run it on the head node, however.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Warning obtained on running rnaseq pipeline:

            WARN: =============================================================================
              Biotype attribute 'gene_biotype' not found in the last column of the GTF file!
            
              Biotype QC will be skipped to circumvent the issue below:
              https://github.com/nf-core/rnaseq/issues/460
            
              Amend '--featurecounts_group_type' to change this behaviour.
            ===================================================================================
            

            used script:

            #!/bin/env bash
            
            SAMPLESHEET=$1
            GENOMEFASTA=$2
            GTF=$3
            GENEBED=$4
            
            MS="Must provide samples, reference genome fasta, annotations GTF, and annotations BED files"
            EX="Example: ./doIt.sh samples.csv S_lycopersicum_Sep_2019.fa S_lycopersium_Sep_2019.gtf S_lycopersium_Sep_2019.bed"
            
            if [[ -z "$SAMPLESHEET" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GENOMEFASTA" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GTF" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            if [[ -z "$GENEBED" ]]; then
                echo "$MS" 1>&2
                echo "$EX" 1>&2
                exit 1
            fi
            
            # for running on cluster nodes lacking internet connection, download the pipeline
            # and Singularity containers as directed in https://nf-co.re/usage/offline
            
            nextflow run nf-core-rnaseq-3.3/workflow -resume -profile singularity --fasta $GENOMEFASTA --input $SAMPLESHEET --gtf $GTF --gene_bed $GENEBED
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Warning obtained on running rnaseq pipeline: WARN: ============================================================================= Biotype attribute 'gene_biotype' not found in the last column of the GTF file! Biotype QC will be skipped to circumvent the issue below: https: //github.com/nf-core/rnaseq/issues/460 Amend '--featurecounts_group_type' to change this behaviour. =================================================================================== used script: #!/bin/env bash SAMPLESHEET=$1 GENOMEFASTA=$2 GTF=$3 GENEBED=$4 MS= "Must provide samples, reference genome fasta, annotations GTF, and annotations BED files" EX= "Example: ./doIt.sh samples.csv S_lycopersicum_Sep_2019.fa S_lycopersium_Sep_2019.gtf S_lycopersium_Sep_2019.bed" if [[ -z "$SAMPLESHEET" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GENOMEFASTA" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GTF" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi if [[ -z "$GENEBED" ]]; then echo "$MS" 1>&2 echo "$EX" 1>&2 exit 1 fi # for running on cluster nodes lacking internet connection, download the pipeline # and Singularity containers as directed in https: //nf-co.re/usage/offline nextflow run nf-core-rnaseq-3.3/workflow -resume -profile singularity --fasta $GENOMEFASTA --input $SAMPLESHEET --gtf $GTF --gene_bed $GENEBED
            Hide
            ann.loraine Ann Loraine added a comment -

            Pipeline ran with:

            doIt.sh 1> out 2> err

            Stdout is attached.

            Show
            ann.loraine Ann Loraine added a comment - Pipeline ran with: doIt.sh 1> out 2> err Stdout is attached.
            Hide
            ann.loraine Ann Loraine added a comment -

            Samples file used:

            sample,fastq_1,fastq_2,strandedness
            are-120min-28-34C-R1,are-120min-28-34C-R1_R1_001.fastq.gz,are-120min-28-34C-R1_R2_001.fastq.gz,reverse
            are-120min-28-34C-R2,are-120min-28-34C-R2_R1_001.fastq.gz,are-120min-28-34C-R2_R2_001.fastq.gz,reverse
            are-120min-28-34C-R3,are-120min-28-34C-R3_R1_001.fastq.gz,are-120min-28-34C-R3_R2_001.fastq.gz,reverse
            are-120min-28C-R1,are-120min-28C-R1_R1_001.fastq.gz,are-120min-28C-R1_R2_001.fastq.gz,reverse
            are-120min-28C-R2,are-120min-28C-R2_R1_001.fastq.gz,are-120min-28C-R2_R2_001.fastq.gz,reverse
            are-120min-28C-R3,are-120min-28C-R3_R1_001.fastq.gz,are-120min-28C-R3_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R1,F3H-OX3-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R1_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R2,F3H-OX3-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R2_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28-34C-R3,F3H-OX3-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R3_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R1,F3H-OX3-120min-28C-R1_R1_001.fastq.gz,F3H-OX3-120min-28C-R1_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R2,F3H-OX3-120min-28C-R2_R1_001.fastq.gz,F3H-OX3-120min-28C-R2_R2_001.fastq.gz,reverse
            F3H-OX3-120min-28C-R3,F3H-OX3-120min-28C-R3_R1_001.fastq.gz,F3H-OX3-120min-28C-R3_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R1,F3H-OX4-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R1_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R2,F3H-OX4-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R2_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28-34C-R3,F3H-OX4-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R3_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R1,F3H-OX4-120min-28C-R1_R1_001.fastq.gz,F3H-OX4-120min-28C-R1_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R2,F3H-OX4-120min-28C-R2_R1_001.fastq.gz,F3H-OX4-120min-28C-R2_R2_001.fastq.gz,reverse
            F3H-OX4-120min-28C-R3,F3H-OX4-120min-28C-R3_R1_001.fastq.gz,F3H-OX4-120min-28C-R3_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R1,VF36-120min-28-34C-R1_R1_001.fastq.gz,VF36-120min-28-34C-R1_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R2,VF36-120min-28-34C-R2_R1_001.fastq.gz,VF36-120min-28-34C-R2_R2_001.fastq.gz,reverse
            VF36-120min-28-34C-R3,VF36-120min-28-34C-R3_R1_001.fastq.gz,VF36-120min-28-34C-R3_R2_001.fastq.gz,reverse
            VF36-120min-28C-R1,VF36-120min-28C-R1_R1_001.fastq.gz,VF36-120min-28C-R1_R2_001.fastq.gz,reverse
            VF36-120min-28C-R2,VF36-120min-28C-R2_R1_001.fastq.gz,VF36-120min-28C-R2_R2_001.fastq.gz,reverse
            VF36-120min-28C-R3,VF36-120min-28C-R3_R1_001.fastq.gz,VF36-120min-28C-R3_R2_001.fastq.gz,reverse
            
            Show
            ann.loraine Ann Loraine added a comment - Samples file used: sample,fastq_1,fastq_2,strandedness are-120min-28-34C-R1,are-120min-28-34C-R1_R1_001.fastq.gz,are-120min-28-34C-R1_R2_001.fastq.gz,reverse are-120min-28-34C-R2,are-120min-28-34C-R2_R1_001.fastq.gz,are-120min-28-34C-R2_R2_001.fastq.gz,reverse are-120min-28-34C-R3,are-120min-28-34C-R3_R1_001.fastq.gz,are-120min-28-34C-R3_R2_001.fastq.gz,reverse are-120min-28C-R1,are-120min-28C-R1_R1_001.fastq.gz,are-120min-28C-R1_R2_001.fastq.gz,reverse are-120min-28C-R2,are-120min-28C-R2_R1_001.fastq.gz,are-120min-28C-R2_R2_001.fastq.gz,reverse are-120min-28C-R3,are-120min-28C-R3_R1_001.fastq.gz,are-120min-28C-R3_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R1,F3H-OX3-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R1_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R2,F3H-OX3-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R2_R2_001.fastq.gz,reverse F3H-OX3-120min-28-34C-R3,F3H-OX3-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX3-120min-28-34C-R3_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R1,F3H-OX3-120min-28C-R1_R1_001.fastq.gz,F3H-OX3-120min-28C-R1_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R2,F3H-OX3-120min-28C-R2_R1_001.fastq.gz,F3H-OX3-120min-28C-R2_R2_001.fastq.gz,reverse F3H-OX3-120min-28C-R3,F3H-OX3-120min-28C-R3_R1_001.fastq.gz,F3H-OX3-120min-28C-R3_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R1,F3H-OX4-120min-28-34C-R1_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R1_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R2,F3H-OX4-120min-28-34C-R2_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R2_R2_001.fastq.gz,reverse F3H-OX4-120min-28-34C-R3,F3H-OX4-120min-28-34C-R3_R1_001.fastq.gz,F3H-OX4-120min-28-34C-R3_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R1,F3H-OX4-120min-28C-R1_R1_001.fastq.gz,F3H-OX4-120min-28C-R1_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R2,F3H-OX4-120min-28C-R2_R1_001.fastq.gz,F3H-OX4-120min-28C-R2_R2_001.fastq.gz,reverse F3H-OX4-120min-28C-R3,F3H-OX4-120min-28C-R3_R1_001.fastq.gz,F3H-OX4-120min-28C-R3_R2_001.fastq.gz,reverse VF36-120min-28-34C-R1,VF36-120min-28-34C-R1_R1_001.fastq.gz,VF36-120min-28-34C-R1_R2_001.fastq.gz,reverse VF36-120min-28-34C-R2,VF36-120min-28-34C-R2_R1_001.fastq.gz,VF36-120min-28-34C-R2_R2_001.fastq.gz,reverse VF36-120min-28-34C-R3,VF36-120min-28-34C-R3_R1_001.fastq.gz,VF36-120min-28-34C-R3_R2_001.fastq.gz,reverse VF36-120min-28C-R1,VF36-120min-28C-R1_R1_001.fastq.gz,VF36-120min-28C-R1_R2_001.fastq.gz,reverse VF36-120min-28C-R2,VF36-120min-28C-R2_R1_001.fastq.gz,VF36-120min-28C-R2_R2_001.fastq.gz,reverse VF36-120min-28C-R3,VF36-120min-28C-R3_R1_001.fastq.gz,VF36-120min-28C-R3_R2_001.fastq.gz,reverse
            Hide
            ann.loraine Ann Loraine added a comment -

            Added token and drive information to rclone configuration file on cluster.

            Started copying data:

            [aloraine@str-i1 nfcore_rnaseq]$ targ="brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run"
            [aloraine@str-i1 nfcore_rnaseq]$ echo $targ
            brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run
            [aloraine@str-i1 nfcore_rnaseq]$ rclone lsd "$targ"
            [ ... listed nothing because folder is empty ... ]
            [aloraine@str-i1 nfcore_rnaseq]$ rclone copy results "$targ"
            
            Show
            ann.loraine Ann Loraine added a comment - Added token and drive information to rclone configuration file on cluster. Started copying data: [aloraine@str-i1 nfcore_rnaseq]$ targ= "brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run" [aloraine@str-i1 nfcore_rnaseq]$ echo $targ brown:Experiments/ARE-WE-THERE-YET In vitro RNAseq #2 - time course 28-34/2021-09-28-nfcore-rnaseq-pipeline-run [aloraine@str-i1 nfcore_rnaseq]$ rclone lsd "$targ" [ ... listed nothing because folder is empty ... ] [aloraine@str-i1 nfcore_rnaseq]$ rclone copy results "$targ"

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: