Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3545

Re-run nextflow Mark pollen tube data for SL4 and SL5 with data dowloaded from SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      SRP252265

      For SL4 and SL5.

      For this task, we need to confirm and sanity-check the Mark 2020 pollen tube data that Rob uploaded and submitted to the Sequence Read Archive.
      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Re-run Directory SL4: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4
            Re-run Directory SL5: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5
            Prefetch SRR Script:

            #! /bin/bash
            
            #SBATCH --job-name=prefetch_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=4gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4
            module load sra-tools/2.11.0
            vdb-config --interactive
            
            files=(
            SRR11284116
            SRR11284117
            SRR11284118
            SRR11284119
            SRR11284120
            SRR11284121
            SRR11284122
            SRR11284123
            SRR11284124
            SRR11284125
            SRR11284126
            SRR11284127
            SRR11284128
            SRR11284129
            SRR11284130
            SRR11284131
            SRR11284132
            SRR11284133
            SRR11284134
            SRR11284135
            SRR11284136
            SRR11284137
            SRR11284138
            SRR11284139
            )
            
            for f in "${files[@]}"; do echo $f; prefetch $f;  done
            
            

            Execute:

            chmod u+x prefetch.slurm
            
            sbatch prefetch.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Re-run Directory SL4 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4 Re-run Directory SL5 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR11284116 SRR11284117 SRR11284118 SRR11284119 SRR11284120 SRR11284121 SRR11284122 SRR11284123 SRR11284124 SRR11284125 SRR11284126 SRR11284127 SRR11284128 SRR11284129 SRR11284130 SRR11284131 SRR11284132 SRR11284133 SRR11284134 SRR11284135 SRR11284136 SRR11284137 SRR11284138 SRR11284139 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm
            Hide
            Mdavis4290 Molly Davis added a comment -

            Faster Dump Script:

            #! /bin/bash
            
            #SBATCH --job-name=fastqdump_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=40gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            #SBATCH --array=1-24
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/Sra_ids.txt)
            
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4
            module load sra-tools/2.11.0
            
            echo "Starting faster-qdump on $file";
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/$file
            
            fasterq-dump ${file}.sra
            
            perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq
            
            cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/${file}_1.fastq
            cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/${file}_2.fastq 
            
            echo "finished"
            

            Execute:

            chmod u+x fasterdump.slurm
            
            sbatch fasterdump.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-24 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm
            Hide
            Mdavis4290 Molly Davis added a comment -

            Nextflow Pipeline ran successfully with SL4 & SL5 genome
            Re-run Directory SL4: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4
            Re-run Directory SL5: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5
            MultiQC report notes: No errors or warnings were present in the report. The output file is named 'SRP252265_SL4_multiqc_report.html' & 'SRP252265_SL5_multiqc_report.html'.

            Show
            Mdavis4290 Molly Davis added a comment - Nextflow Pipeline ran successfully with SL4 & SL5 genome Re-run Directory SL4 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4 Re-run Directory SL5 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5 MultiQC report notes : No errors or warnings were present in the report. The output file is named 'SRP252265_SL4_multiqc_report.html' & 'SRP252265_SL5_multiqc_report.html'.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Next steps:
            Commit multiqc report and csv files for SL4 and SL5 to Splicing repo on bitbucket
            Change sorted bam names
            Create junction files
            Create Coverage graphs

            Show
            Mdavis4290 Molly Davis added a comment - Next steps : Commit multiqc report and csv files for SL4 and SL5 to Splicing repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs
            Hide
            Mdavis4290 Molly Davis added a comment -

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Show
            Mdavis4290 Molly Davis added a comment - Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            Mdavis4290 Molly Davis added a comment -

            Running into space error on cluster:
            Directory: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5
            Error:

            INFO:    Converting SIF file to temporary sandbox...
              FATAL:   while extracting /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/nf-core-rnaseq-3.4/workflow/../singularity-images/depot.galaxyproject.org-singularity-rseqc-3.0.1--py37h516909a_1.img: root filesystem extraction failed: command error: while creating /tmp/rootfs-801489829/tmp-rootfs-086038002/lib64/ld-linux-x86-64.so.2: open /tmp/rootfs-801489829/tmp-rootfs-086038002/lib64/ld-linux-x86-64.so.2: no space left on device
            
            Work dir:
              /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/work/52/06413abe49d07c495e87b3291d2962
            

            Next step: Run nextflow for SL5 once the storage issue is fixed.

            Show
            Mdavis4290 Molly Davis added a comment - Running into space error on cluster : Directory : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5 Error : INFO: Converting SIF file to temporary sandbox... FATAL: while extracting /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/nf-core-rnaseq-3.4/workflow/../singularity-images/depot.galaxyproject.org-singularity-rseqc-3.0.1--py37h516909a_1.img: root filesystem extraction failed: command error: while creating /tmp/rootfs-801489829/tmp-rootfs-086038002/lib64/ld-linux-x86-64.so.2: open /tmp/rootfs-801489829/tmp-rootfs-086038002/lib64/ld-linux-x86-64.so.2: no space left on device Work dir: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/work/52/06413abe49d07c495e87b3291d2962 Next step : Run nextflow for SL5 once the storage issue is fixed.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            URC Solution:

            It looks like you're running out of local TEMP space in /tmp. Please try setting the following environment variables in your submit script, prior to running your singularity command:
            
            export SINGULARITY_TMPDIR=/scratch/$USER
            export SINGULARITY_CACHEDIR=/scratch/$USER
            export TMPDIR=/scratch/$USER
            

            Code I used:
            export NXF_SINGULARITY_CACHEDIR=/scratch/$USER
            export NXF_SINGULARITY_TMPDIR=/scratch/$USER
            export NXF_TMPDIR=/scratch/$USER

            Show
            Mdavis4290 Molly Davis added a comment - - edited URC Solution: It looks like you're running out of local TEMP space in /tmp. Please try setting the following environment variables in your submit script, prior to running your singularity command: export SINGULARITY_TMPDIR=/scratch/$USER export SINGULARITY_CACHEDIR=/scratch/$USER export TMPDIR=/scratch/$USER Code I used : export NXF_SINGULARITY_CACHEDIR=/scratch/$USER export NXF_SINGULARITY_TMPDIR=/scratch/$USER export NXF_TMPDIR=/scratch/$USER
            Hide
            Mdavis4290 Molly Davis added a comment -

            Picking up this ticket again and rerunning SL5 from scratch.

            Show
            Mdavis4290 Molly Davis added a comment - Picking up this ticket again and rerunning SL5 from scratch.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Branch: https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3545

            Re-run Directory SL4: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon
            Re-run Directory SL5: /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/results/star_salmon
            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3545 Re-run Directory SL4 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon Re-run Directory SL5 : /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL5/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            Hide
            robofjoy Robert Reid added a comment -

            24 bam files.
            24 bai files.

            HOWEVER!!

            /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon$ ll *bedgraph.gz.tbi | wc -l

            23 index files for the bedgraphs!
            And 23 bedgraph files!

            The bam and bam index files are 24 and appear to be same size. So bedgraph stage is likley where issue lies.
            Kicking it back to Molly!

            Show
            robofjoy Robert Reid added a comment - 24 bam files. 24 bai files. HOWEVER!! /projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon$ ll *bedgraph.gz.tbi | wc -l 23 index files for the bedgraphs! And 23 bedgraph files! The bam and bam index files are 24 and appear to be same size. So bedgraph stage is likley where issue lies. Kicking it back to Molly!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Should be fixed now! thank you for the catch!

            Directory:/projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon

            PR: https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/17

            Show
            Mdavis4290 Molly Davis added a comment - - edited Should be fixed now! thank you for the catch! Directory :/projects/tomato_genome/fnb/dataprocessing/SRP252265/nfcore-SL4/results/star_salmon PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/17
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged into the "splicing analysis" repository.

            Show
            ann.loraine Ann Loraine added a comment - PR is merged into the "splicing analysis" repository.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Review:

            • All quick load files are accounted for in both SL4 and SL5 directories
            • I made sure permission were accessible for group chmod -R g+w *
            • The merged files on bitbucket are correct

            Moving to Done!

            Show
            Mdavis4290 Molly Davis added a comment - Review : All quick load files are accounted for in both SL4 and SL5 directories I made sure permission were accessible for group chmod -R g+w * The merged files on bitbucket are correct Moving to Done!

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: