Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3544

Re-run mature pollen and seedling pipeline with SL4 using data newly submitted to SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      SRP438952

      For this task, we need to confirm and sanity-check the seedling and mature pollen data that Rob recently uploaded and submitted to the Sequence Read Archive.
      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Mdavis4290 Molly Davis created issue -
            Mdavis4290 Molly Davis made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3424 [ IGBF-3424 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3498 [ IGBF-3498 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3499 [ IGBF-3499 ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            Re-run Directory:/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
            Prefetch SRR Script:

            #! /bin/bash
            
            #SBATCH --job-name=prefetch_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=4gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
            module load sra-tools/2.11.0
            vdb-config --interactive
            
            files=(
            SRR24685698
            SRR24685699
            SRR24685700
            SRR24685701
            SRR24685702
            SRR24685703
            SRR24685704
            SRR24685705
            SRR24685706
            SRR24685707
            SRR24685708
            SRR24685709
            SRR24685710
            SRR24685711
            SRR24685712
            SRR24685713
            SRR24685714
            SRR24685715
            SRR24685716
            SRR24685717
            SRR24685718
            SRR24685719
            SRR24685720
            SRR24685721
            SRR24685722
            SRR24685723
            SRR24685724
            SRR24685725
            SRR24685726
            SRR24685727
            SRR24685728
            SRR24685729
            SRR24685730
            SRR24685731
            SRR24685732
            SRR24685733
            SRR24685734
            SRR24685735
            SRR24685736
            SRR24685737
            SRR24685738
            SRR24685739
            SRR24685740
            SRR24685741
            SRR24685742
            SRR24685743
            SRR24685744
            SRR24685745
            )
            
            for f in "${files[@]}"; do echo $f; prefetch $f;  done
            
            

            Execute:

            chmod u+x prefetch.slurm
            
            sbatch prefetch.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - Re-run Directory :/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR24685698 SRR24685699 SRR24685700 SRR24685701 SRR24685702 SRR24685703 SRR24685704 SRR24685705 SRR24685706 SRR24685707 SRR24685708 SRR24685709 SRR24685710 SRR24685711 SRR24685712 SRR24685713 SRR24685714 SRR24685715 SRR24685716 SRR24685717 SRR24685718 SRR24685719 SRR24685720 SRR24685721 SRR24685722 SRR24685723 SRR24685724 SRR24685725 SRR24685726 SRR24685727 SRR24685728 SRR24685729 SRR24685730 SRR24685731 SRR24685732 SRR24685733 SRR24685734 SRR24685735 SRR24685736 SRR24685737 SRR24685738 SRR24685739 SRR24685740 SRR24685741 SRR24685742 SRR24685743 SRR24685744 SRR24685745 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm
            Hide
            Mdavis4290 Molly Davis added a comment -

            Faster Dump Script:

            #! /bin/bash
            
            #SBATCH --job-name=fastqdump_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=40gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            #SBATCH --array=1-48
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/Sra_ids.txt)
            
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
            module load sra-tools/2.11.0
            
            echo "Starting faster-qdump on $file";
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/$file
            
            fasterq-dump ${file}.sra
            
            perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq
            
            cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_1.fastq
            cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_2.fastq 
            
            echo "finished"
            

            Execute:

            chmod u+x fasterdump.slurm
            
            sbatch fasterdump.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-48 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1 [ 185 ] Spring 1, Spring 2 [ 185, 186 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow Pipeline ran successfully with SL4 genome
            Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
            MultiQC report notes: No errors or warnings were present in the report. The strandedness did need to be changed to 'unstranded' instead of 'reverse' for SL4 it is saying but when you do change it to unstranded it says it need to be in reverse so I am keeping it at unstranded which did work for SL5. The output file is named 'SRP438952_SL4_multiqc_report.html'.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow Pipeline ran successfully with SL4 genome Directory : /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 MultiQC report notes : No errors or warnings were present in the report. The strandedness did need to be changed to 'unstranded' instead of 'reverse' for SL4 it is saying but when you do change it to unstranded it says it need to be in reverse so I am keeping it at unstranded which did work for SL5. The output file is named 'SRP438952_SL4_multiqc_report.html'.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Next steps:
            Commit multiqc report to Splicing repo on bitbucket
            Change sorted bam names
            Create junction files
            Create Coverage graphs

            Show
            Mdavis4290 Molly Davis added a comment - Next steps : Commit multiqc report to Splicing repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs
            Hide
            Mdavis4290 Molly Davis added a comment -

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Show
            Mdavis4290 Molly Davis added a comment - Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

            Show
            Mdavis4290 Molly Davis added a comment - - edited Directory : /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Robert Reid [ robertreid ]
            Hide
            robofjoy Robert Reid added a comment -

            The folder does indeed exist!

            ll *bed.gz | wc -l
            48

            We have 48 experiments coinciding with what is in the SRA.

            Running these commands:
            ll *bed.gz.tbi
            988 ll *bed.gz.tbi | wc -l
            989 ll *bed.gz | wc -l
            991 ll *bam | wc -l
            992 ll *bai | wc -l
            995 cat *.err
            996 ll *err
            997 ll *bam
            998 ll *bai
            1000 ll *.bed.gz

            (I have an alias ll that equald ll='ls -lrt'). So the above commands use this alias.

            A weird warning!!
            WARNING: BAM index file /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam.bai is older than BAM /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam

            I don't think this is anything. But it might be good to test a few of these in samples in IGB to ensure everything is ok.
            More than likely the .bai file was produced so quick, it gets a timestamp prior to the actual bam file.

            All of the bam and bai files are the size as expected. So the warning above is likely nothing!!

            And everything else looks as it should.

            Show
            robofjoy Robert Reid added a comment - The folder does indeed exist! ll *bed.gz | wc -l 48 We have 48 experiments coinciding with what is in the SRA. Running these commands: ll *bed.gz.tbi 988 ll *bed.gz.tbi | wc -l 989 ll *bed.gz | wc -l 991 ll *bam | wc -l 992 ll *bai | wc -l 995 cat *.err 996 ll *err 997 ll *bam 998 ll *bai 1000 ll *.bed.gz (I have an alias ll that equald ll='ls -lrt'). So the above commands use this alias. A weird warning!! WARNING: BAM index file /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam.bai is older than BAM /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam I don't think this is anything. But it might be good to test a few of these in samples in IGB to ensure everything is ok. More than likely the .bai file was produced so quick, it gets a timestamp prior to the actual bam file. All of the bam and bai files are the size as expected. So the warning above is likely nothing!! And everything else looks as it should.
            robofjoy Robert Reid made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            robofjoy Robert Reid made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3544 PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/14
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2 [ 185, 186 ] Spring 1, Spring 2, Spring 3 [ 185, 186, 187 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestions for testing:

            • Check that these new quality control reports are consistent with the original quality control reports obtained when we processed the original, pre-submission data files. The results should be the same.
            Show
            ann.loraine Ann Loraine added a comment - Suggestions for testing: Check that these new quality control reports are consistent with the original quality control reports obtained when we processed the original, pre-submission data files. The results should be the same.
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not?

            If not, it would be good to do that now.

            The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, the QC report will likely show it.

            Show
            ann.loraine Ann Loraine added a comment - Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not? If not, it would be good to do that now. The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, the QC report will likely show it.
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1, Spring 2, Spring 3 [ 185, 186, 187 ] Spring 1, Spring 2, Spring 3, Spring 4 [ 185, 186, 187, 188 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            The original mature pollen and seedlings data on the cluster was not run with SL4, only SL5. So the next step would be to run the original data with SL4 and then compare the rerun SL4 report with the original report.

            Show
            Mdavis4290 Molly Davis added a comment - The original mature pollen and seedlings data on the cluster was not run with SL4, only SL5. So the next step would be to run the original data with SL4 and then compare the rerun SL4 report with the original report.
            Mdavis4290 Molly Davis made changes -
            Status Post-merge Testing In Progress [ 10003 ] Merged Needs Testing [ 10002 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3613 [ IGBF-3613 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            When I make the comparison ticket for mature pollen and seedling I will compare the mutliqc reports then so for now I will move this ticket to Done!

            Show
            Mdavis4290 Molly Davis added a comment - When I make the comparison ticket for mature pollen and seedling I will compare the mutliqc reports then so for now I will move this ticket to Done!
            Mdavis4290 Molly Davis made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            Mdavis4290 Molly Davis made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: