Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3242

Run nextflow for seedlingPollen dataset

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Re-run nextflow rnaseq pipeline in:

      /nobackup/tomato_genome/seedlingPollen/nfcore

      Using new samples.csv file with "reverse" in the strandedness field.

      Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

      Note about sample names:

      MP - mature pollen
      Se - seedling
      N - nagcarlang
      H - heinz
      T - tamaulipas
      M - malintka
      S - stress
      C - control

      See folder in the google drive with info about the experiment:

      • Experiments > Pollen Grain and Seedling RNA-seq

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            ann.loraine Ann Loraine made changes -
            Description * Get data from google drive onto the cluster
            * Migrate data from cluster to an igbquickload host
            * Get data from google drive onto the cluster
            * Migrate data from cluster to an igbquickload host

            See: https://drive.google.com/drive/folders/1eHn-uK6N5PEr3fr8kfWHrySrarx5XOYU?usp=share_link
            ann.loraine Ann Loraine made changes -
            Description * Get data from google drive onto the cluster
            * Migrate data from cluster to an igbquickload host

            See: https://drive.google.com/drive/folders/1eHn-uK6N5PEr3fr8kfWHrySrarx5XOYU?usp=share_link
            * Get data from google drive onto the cluster
            * Migrate data from cluster to an igbquickload host

            See: https://drive.google.com/drive/folders/1GJnZefP-7TE-ch-c0lblGZMOqpwSgZRK?usp=share_link
            ann.loraine Ann Loraine made changes -
            Summary Deploy new Dec 2022 data onto an igbquickload site Run nextflow for mark-2022-timeseries
            ann.loraine Ann Loraine made changes -
            Description * Get data from google drive onto the cluster
            * Migrate data from cluster to an igbquickload host

            See: https://drive.google.com/drive/folders/1GJnZefP-7TE-ch-c0lblGZMOqpwSgZRK?usp=share_link
            Run nextflow nfcore rnaseq pipeline on fastq files in:

            /projects/tomato_genome/rnaseq/mark-2022-timeseries/30-771363348/00_fastq

            ann.loraine Ann Loraine made changes -
            Description Run nextflow nfcore rnaseq pipeline on fastq files in:

            /projects/tomato_genome/rnaseq/mark-2022-timeseries/30-771363348/00_fastq

            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness

            ann.loraine Ann Loraine made changes -
            Summary Run nextflow for mark-2022-timeseries Run nextflow for seedlingPollen dataset
            ann.loraine Ann Loraine made changes -
            Description Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness

            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas

            S - heat-treated, 34 degrees
            C - non-heated treated, raised at 28 degrees

            ann.loraine Ann Loraine made changes -
            Description Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas

            S - heat-treated, 34 degrees
            C - non-heated treated, raised at 28 degrees

            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive

            ann.loraine Ann Loraine made changes -
            Description Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive

            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive with info about the experiment:

            Experiments > Pollen Grain and Seedling RNA-seq > SRA Description - pollen grain and seedling experiments


            ann.loraine Ann Loraine made changes -
            Description Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamalipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive with info about the experiment:

            Experiments > Pollen Grain and Seedling RNA-seq > SRA Description - pollen grain and seedling experiments


            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamaulipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive with info about the experiment:

            Experiments > Pollen Grain and Seedling RNA-seq > SRA Description - pollen grain and seedling experiments


            ann.loraine Ann Loraine made changes -
            Story Points 1 0.5
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Update: I fixed the sample sheet so that strandedness is reverse. When running nextflow I come across this error for trimgalore:

            Command error:
              INFO:    Converting SIF file to temporary sandbox...
              Path to Cutadapt set as: 'cutadapt' (default)
              Cutadapt seems to be working fine (tested command 'cutadapt --version')
              Cutadapt version: 3.4
              Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<)
              Letting the (modified) Cutadapt deal with the Python version instead
              Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 4 cores
              
              No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
              
              Input file 'MP-MC-1_1.fastq.gz' seems to be completely empty. Consider respecifying!
              
              INFO:    Cleaning up image...
            
            Work dir:
              /nobackup/tomato_genome/seedlingPollen/nfcore/work/15/4c200c1d0331cd19fc66b69eaad96b
            

            Question: are all fastq files double stranded or single stranded? Or is there a naming issue in the sample csv file?

            Next step: Instead of using symbolic links I will copy the fastq files to the following directory to see if it works with nextflow. I have had issues with symbolic links before with fastq files and copying the files instead seemed to fix the issue.
            Copy From: /projects/tomato_genome/rnaseq/mark-2022-pollengrainSeedling/00_fastq/
            Copy To: /nobackup/tomato_genome/seedlingPollen/nfcore

            Issue: Do not have permission to copy fastq files.

            Fixed Issue: Dr. Robert Reid gave permission so I could copy files over.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Update: I fixed the sample sheet so that strandedness is reverse. When running nextflow I come across this error for trimgalore: Command error: INFO: Converting SIF file to temporary sandbox... Path to Cutadapt set as: 'cutadapt' ( default ) Cutadapt seems to be working fine (tested command 'cutadapt --version') Cutadapt version: 3.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this : >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 4 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores ( default ) Input file 'MP-MC-1_1.fastq.gz' seems to be completely empty. Consider respecifying! INFO: Cleaning up image... Work dir: /nobackup/tomato_genome/seedlingPollen/nfcore/work/15/4c200c1d0331cd19fc66b69eaad96b Question: are all fastq files double stranded or single stranded? Or is there a naming issue in the sample csv file? Next step : Instead of using symbolic links I will copy the fastq files to the following directory to see if it works with nextflow. I have had issues with symbolic links before with fastq files and copying the files instead seemed to fix the issue. Copy From : /projects/tomato_genome/rnaseq/mark-2022-pollengrainSeedling/00_fastq/ Copy To : /nobackup/tomato_genome/seedlingPollen/nfcore Issue: Do not have permission to copy fastq files. Fixed Issue: Dr. Robert Reid gave permission so I could copy files over.
            ann.loraine Ann Loraine made changes -
            Description Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamaulipas
            M - malintka
            S - heat-treated, 37 degrees
            C - non-heated treated, raised at 28 degrees

            See "experiment" folder in the google drive with info about the experiment:

            Experiments > Pollen Grain and Seedling RNA-seq > SRA Description - pollen grain and seedling experiments


            Re-run nextflow rnaseq pipeline in:

            /nobackup/tomato_genome/seedlingPollen/nfcore

            Using new samples.csv file with "reverse" in the strandedness field.

            Please just edit "samples.csv" in /nobackup/tomato_genome/seedlingPollen/nfcore.

            Note about sample names:

            MP - mature pollen
            Se - seedling
            N - nagcarlang
            H - heinz
            T - tamaulipas
            M - malintka
            S - stress
            C - control

            See folder in the google drive with info about the experiment:

            * Experiments > Pollen Grain and Seedling RNA-seq


            Mdavis4290 Molly Davis made changes -
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow Pipeline Ran Successfully!
            Directory: /nobackup/tomato_genome/seedlingPollen/nfcore
            Next steps: Rename sorted bam files and make scaled coverage graphs.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow Pipeline Ran Successfully! Directory: /nobackup/tomato_genome/seedlingPollen/nfcore Next steps: Rename sorted bam files and make scaled coverage graphs.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Scaled coverage graphs have been made and are located:

            /nobackup/tomato_genome/seedlingPollen/nfcore/results/star_salmon
            

            Notes: I can move the coverage graphs to their own directory if you would like. Let me know!

            Multiqc Report:

            scp mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/seedlingPollen/nfcore/results/multiqc/star_salmon/multiqc_report.html nfcore_multiqc_report.html
            

            nfcore_multiqc_report.html

            Notes: Seems there are no issues with strandedness.

            Next step: Pipeline, coverage graphs, and Multiqc report need to be reviewed.
            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Scaled coverage graphs have been made and are located: /nobackup/tomato_genome/seedlingPollen/nfcore/results/star_salmon Notes: I can move the coverage graphs to their own directory if you would like. Let me know! Multiqc Report: scp mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/seedlingPollen/nfcore/results/multiqc/star_salmon/multiqc_report.html nfcore_multiqc_report.html nfcore_multiqc_report.html Notes: Seems there are no issues with strandedness. Next step : Pipeline, coverage graphs, and Multiqc report need to be reviewed. [~aloraine]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Attachment nfcore_multiqc_report.html [ 17657 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Ran find junctions pipeline. Copied data to host with:

            scp -J aloraine@hop.renci.org -r junctions aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/hotpollen/S_lycopersicum_Jun_2022/mark-2022-timeseries/.
            
            Show
            ann.loraine Ann Loraine added a comment - Ran find junctions pipeline. Copied data to host with: scp -J aloraine@hop.renci.org -r junctions aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/hotpollen/S_lycopersicum_Jun_2022/mark-2022-timeseries/.
            Show
            ann.loraine Ann Loraine added a comment - Added samples.csv file to repository. See: https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/seedlingPollen-samples.csv
            Hide
            ann.loraine Ann Loraine added a comment -

            Got description of data set. See attached.

            Show
            ann.loraine Ann Loraine added a comment - Got description of data set. See attached.
            ann.loraine Ann Loraine made changes -
            Hide
            ann.loraine Ann Loraine added a comment -

            Deployed data to quickload. Did not examine pattern of coverage graphs relative to other data sets.
            Added sample sheet to repositor: https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/seedlingPollen_sample_sheet.xlsx

            Show
            ann.loraine Ann Loraine added a comment - Deployed data to quickload. Did not examine pattern of coverage graphs relative to other data sets. Added sample sheet to repositor: https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/seedlingPollen_sample_sheet.xlsx
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3246 [ IGBF-3246 ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: