Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3424

Re-run mature pollen and seedling pipeline with SL5 using data newly submitted to SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      SRP438952

      For this task, we need to confirm and sanity-check the seedling and mature pollen data that Rob recently uploaded and submitted to the Sequence Read Archive.
      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Re-run Directory:/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL5
            Prefetch SRR Script:

            #! /bin/bash
            
            #SBATCH --job-name=prefetch_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=4gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952
            module load sra-tools/2.11.0
            vdb-config --interactive
            
            files=(
            SRR24685698
            SRR24685699
            SRR24685700
            SRR24685701
            SRR24685702
            SRR24685703
            SRR24685704
            SRR24685705
            SRR24685706
            SRR24685707
            SRR24685708
            SRR24685709
            SRR24685710
            SRR24685711
            SRR24685712
            SRR24685713
            SRR24685714
            SRR24685715
            SRR24685716
            SRR24685717
            SRR24685718
            SRR24685719
            SRR24685720
            SRR24685721
            SRR24685722
            SRR24685723
            SRR24685724
            SRR24685725
            SRR24685726
            SRR24685727
            SRR24685728
            SRR24685729
            SRR24685730
            SRR24685731
            SRR24685732
            SRR24685733
            SRR24685734
            SRR24685735
            SRR24685736
            SRR24685737
            SRR24685738
            SRR24685739
            SRR24685740
            SRR24685741
            SRR24685742
            SRR24685743
            SRR24685744
            SRR24685745
            )
            
            for f in "${files[@]}"; do echo $f; prefetch $f;  done
            
            

            Execute:

            chmod u+x prefetch.slurm
            
            sbatch prefetch.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Re-run Directory :/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL5 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP438952 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR24685698 SRR24685699 SRR24685700 SRR24685701 SRR24685702 SRR24685703 SRR24685704 SRR24685705 SRR24685706 SRR24685707 SRR24685708 SRR24685709 SRR24685710 SRR24685711 SRR24685712 SRR24685713 SRR24685714 SRR24685715 SRR24685716 SRR24685717 SRR24685718 SRR24685719 SRR24685720 SRR24685721 SRR24685722 SRR24685723 SRR24685724 SRR24685725 SRR24685726 SRR24685727 SRR24685728 SRR24685729 SRR24685730 SRR24685731 SRR24685732 SRR24685733 SRR24685734 SRR24685735 SRR24685736 SRR24685737 SRR24685738 SRR24685739 SRR24685740 SRR24685741 SRR24685742 SRR24685743 SRR24685744 SRR24685745 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Faster Dump Script:

            #! /bin/bash
            
            #SBATCH --job-name=fastqdump_SRR
            #SBATCH --partition=Orion
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --mem=40gb
            #SBATCH --output=%x_%j.out
            #SBATCH --time=24:00:00
            #SBATCH --array=1-48
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP438952/Sra_ids.txt)
            
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952
            module load sra-tools/2.11.0
            
            echo "Starting faster-qdump on $file";
            
            cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/$file
            
            fasterq-dump ${file}.sra
            
            perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq
            
            cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_1.fastq
            cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_2.fastq 
            
            echo "finished"
            

            Execute:

            chmod u+x fasterdump.slurm
            
            sbatch fasterdump.slurm
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-48 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP438952/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP438952 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow Pipeline ran successfully with SL5 genome
            Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952
            MultiQC report notes: No errors or warnings were present in the report. The output file is named 'SRP438952_multiqc_report.html'.

            Cluster Note: I had to email OneIT UNCC due to issues with logging in on the cluster. Their response:

            We narrowed this issue down to one of our interactive nodes, str-i2, which was unusually overloaded. We rebooted it, and things seem to be back to normal. Please try your login again, and let me know if the issue persists.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow Pipeline ran successfully with SL5 genome Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952 MultiQC report notes: No errors or warnings were present in the report. The output file is named 'SRP438952_multiqc_report.html'. Cluster Note : I had to email OneIT UNCC due to issues with logging in on the cluster. Their response: We narrowed this issue down to one of our interactive nodes, str-i2, which was unusually overloaded. We rebooted it, and things seem to be back to normal. Please try your login again, and let me know if the issue persists.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Next steps:

            • Commit CSV and multiqc report to Splicing repo on bitbucket
            • Change sorted bam names
            • Create junction files
            • Create Coverage graphs
            Show
            Mdavis4290 Molly Davis added a comment - - edited Next steps : Commit CSV and multiqc report to Splicing repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs
            Hide
            Mdavis4290 Molly Davis added a comment -

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Show
            Mdavis4290 Molly Davis added a comment - Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/results/star_salmon

            Reviewer:
            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

            Reviewer: [~RobertReid]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" Reviewer: [~RobertReid]
            Hide
            robofjoy Robert Reid added a comment -

            Checking things.

            • All of the file types have 48 total files as expected
            • There is a tbi file for every gz file.
            • The sizes of all the files are similar to one another the tbi index files.
            • The FJ.bed.gz files are similar in size at around 4MB +- 4.
            • The scaled bedgraphs are all 30MB to 130MB in size. Similar enough!
            • There is a total of 96 tbi files as expected (48 scaled and 48 FJ)
            • There is a total of 96 gz files as expected (48 scaled and 48 FJ)

            All looks as expected!
            Great job.

            Show
            robofjoy Robert Reid added a comment - Checking things. All of the file types have 48 total files as expected There is a tbi file for every gz file. The sizes of all the files are similar to one another the tbi index files. The FJ.bed.gz files are similar in size at around 4MB +- 4. The scaled bedgraphs are all 30MB to 130MB in size. Similar enough! There is a total of 96 tbi files as expected (48 scaled and 48 FJ) There is a total of 96 gz files as expected (48 scaled and 48 FJ) All looks as expected! Great job.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Branch: https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3424
            Includes:

            • MultiQC Report
            • CSV

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3424 Includes: MultiQC Report CSV [~aloraine]
            Show
            Mdavis4290 Molly Davis added a comment - Pull Request : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/12

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: