Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3688

Rerun nextflow with ARE 120 minute Muday data

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      SRP499796

      Directory: /projects/tomato_genome/fnb/dataprocessing/SRP499796

      SL4 and SL5 need to be run with this data set.

      For this task, we need to confirm and sanity-check the ARE 120 minute flavonoid data that Rob recently uploaded and submitted to the Sequence Read Archive.
      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Mdavis4290 Molly Davis created issue -
            Mdavis4290 Molly Davis made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3686 [ IGBF-3686 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Description SRP499796 SRP499796

            *Directory*: /projects/tomato_genome/fnb/dataprocessing/SRP499796

            SL4 and SL5 need to be run with this data set.

            For this task, we need to confirm and sanity-check the ARE 120 minute flavonoid data that Rob recently uploaded and submitted to the Sequence Read Archive.
            If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
            For this task:
            * Check SRP on NCBI and review submission
            * Download the data onto the cluster by using the SRP name
            * Run nf-core/rnaseq pipeline
            * Run our coverage graph and junctions scripts on the data

            Note that all files should now use their "SRR" names instead of the existing file names.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Re-run Directory: /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4

            Prefetch SRR Script:

            
            #! /bin/bash
            
            #SBATCH --job-name=prefetch_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=4gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            
            cd  /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4
            module load sra-tools/2.11.0
            vdb-config --interactive
            
            files=(
            SRR28558218
            SRR28558219
            SRR28558220
            SRR28558221
            SRR28558222
            SRR28558223
            SRR28558224
            SRR28558225
            SRR28558226
            SRR28558227
            SRR28558228
            SRR28558229
            SRR28558230
            SRR28558231
            SRR28558232
            SRR28558233
            SRR28558234
            SRR28558235
            SRR28558236
            SRR28558237
            SRR28558238
            SRR28558239
            SRR28558240
            SRR28558241
            )
            
            for f in "${files[@]}"; do echo $f; prefetch $f;  done
            
            
            

            Execute:

            chmod u+x prefetch.slurm
            
            sbatch prefetch.slurm
            

            Faster Dump Script:

            #! /bin/bash
            
            #SBATCH --job-name=fastqdump_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=40gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            #SBATCH --array=1-24
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/Sra_ids.txt)
            
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4
            module load sra-tools/2.11.0
            
            echo "Starting faster-qdump on $file";
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/$file
            
            fasterq-dump ${file}.sra
            
            perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq
            
            cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_1.fastq
            cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_2.fastq 
            
            echo "finished"
            

            Execute:

            chmod u+x fasterdump.slurm
            
            sbatch fasterdump.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Re-run Directory : /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR28558218 SRR28558219 SRR28558220 SRR28558221 SRR28558222 SRR28558223 SRR28558224 SRR28558225 SRR28558226 SRR28558227 SRR28558228 SRR28558229 SRR28558230 SRR28558231 SRR28558232 SRR28558233 SRR28558234 SRR28558235 SRR28558236 SRR28558237 SRR28558238 SRR28558239 SRR28558240 SRR28558241 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-24 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm
            ann.loraine Ann Loraine made changes -
            Sprint Spring 7 [ 191 ] Spring 7, Spring 8 [ 191, 192 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow Pipeline ran successfully with SL4 and SL5 genome
            Directory:

            • /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4
            • /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5

            MultiQC report notes: No errors or warnings were present in the report. The output files are named 'SRP499796_SL4_multiqc_report.html' & 'SRP499796_SL5_multiqc_report.html'.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow Pipeline ran successfully with SL4 and SL5 genome Directory: /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5 MultiQC report notes: No errors or warnings were present in the report. The output files are named 'SRP499796_SL4_multiqc_report.html' & 'SRP499796_SL5_multiqc_report.html'.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Next steps:
            Commit multiqc reports and csv sample file to Flavonoid repo on bitbucket
            Change sorted bam names
            Create junction files
            Create Coverage graphs

            Show
            Mdavis4290 Molly Davis added a comment - - edited Next steps: Commit multiqc reports and csv sample file to Flavonoid repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs
            Hide
            Mdavis4290 Molly Davis added a comment -

            Branch: https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688

            • ARE-120min-analysis/SRP499796.csv
            • ARE-120min-analysis/SRP499796_SL4_multiqc_report.html
            • ARE-120min-analysis/SRP499796_SL5_multiqc_report.html
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688 ARE-120min-analysis/SRP499796.csv ARE-120min-analysis/SRP499796_SL4_multiqc_report.html ARE-120min-analysis/SRP499796_SL5_multiqc_report.html
            Hide
            Mdavis4290 Molly Davis added a comment -

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Show
            Mdavis4290 Molly Davis added a comment - Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directories:
            /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/results/star_salmon
            /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5/results/star_salmon
            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

            Show
            Mdavis4290 Molly Davis added a comment - - edited Directories : /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/results/star_salmon /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Robert Reid [ robertreid ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Hide
            robofjoy Robert Reid added a comment -

            The SL4 Folder
            The TSV files look complete.
            All the bedgraphs are about 45MB in size and there are 24 files.
            All the bed files are about 4.5MB in size and there are 24 files.
            All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions.
            Bams are 2.8 GB in size and there are 24.

            The SL5 Folder
            The TSV files look complete. 36K number of lines looks correct. for SL5.
            All the bedgraphs are about 45MB in size and there are 24 files.
            All the bed files are about 4.5MB in size and there are 24 files.
            All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions.
            Bams are 2.8 GB in size and there are 24.

            This looks correct! Passing it back to Molly.

            Show
            robofjoy Robert Reid added a comment - The SL4 Folder The TSV files look complete. All the bedgraphs are about 45MB in size and there are 24 files. All the bed files are about 4.5MB in size and there are 24 files. All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions. Bams are 2.8 GB in size and there are 24. The SL5 Folder The TSV files look complete. 36K number of lines looks correct. for SL5. All the bedgraphs are about 45MB in size and there are 24 files. All the bed files are about 4.5MB in size and there are 24 files. All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions. Bams are 2.8 GB in size and there are 24. This looks correct! Passing it back to Molly.
            robofjoy Robert Reid made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]
            robofjoy Robert Reid made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Mdavis4290 Molly Davis made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Mdavis4290 Molly Davis made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Branch: https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688
            PR: https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/44

            • ARE-120min-analysis/SRP499796.csv
            • ARE-120min-analysis/SRP499796_SL4_multiqc_report.html
            • ARE-120min-analysis/SRP499796_SL5_multiqc_report.html
            Show
            Mdavis4290 Molly Davis added a comment - - edited Branch : https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688 PR : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/44 ARE-120min-analysis/SRP499796.csv ARE-120min-analysis/SRP499796_SL4_multiqc_report.html ARE-120min-analysis/SRP499796_SL5_multiqc_report.html
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged. Moving to "ready for testing."

            To test, review the files. If no problems observed, move forward to "Done."

            Show
            ann.loraine Ann Loraine added a comment - PR is merged. Moving to "ready for testing." To test, review the files. If no problems observed, move forward to "Done."
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3710 [ IGBF-3710 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            I reviewed the files as follows:

            • Checked that I could open the MultiQC files in my Web browser. I was able to open and review both.
            • Checked that the data are reported as "unstranded" in the run configuration file SRP499796.csv. They were.
            • Checked that RSeQC reported an about equal number of sense and antisense reads (with respect to gene models provided to the pipeline). It did.
            • However, I notice there is no sample sheet available in the repository for this data set. We will need to provide this to set up the data in IGB quickload. I made a ticket for it with more details.

            Testing passes. Moving to DONE.

            Show
            ann.loraine Ann Loraine added a comment - I reviewed the files as follows: Checked that I could open the MultiQC files in my Web browser. I was able to open and review both. Checked that the data are reported as "unstranded" in the run configuration file SRP499796.csv. They were. Checked that RSeQC reported an about equal number of sense and antisense reads (with respect to gene models provided to the pipeline). It did. However, I notice there is no sample sheet available in the repository for this data set. We will need to provide this to set up the data in IGB quickload. I made a ticket for it with more details. Testing passes. Moving to DONE.
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: