[IGBF-3424] Re-run mature pollen and seedling pipeline with SL5 using data newly submitted to SRA - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Support NSF pollen grant
Sprint:
Fall 1 - Sep 5, Summer 8 2023 Aug 21

Description

SRP438952

For this task, we need to confirm and sanity-check the seedling and mature pollen data that Rob recently uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:

Check SRP on NCBI and review submission
Download the data onto the cluster by using the SRP name
Run nf-core/rnaseq pipeline
Run our coverage graph and junctions scripts on the data

Note that all files should now use their "SRR" names instead of the existing file names.

Attachments

Issue Links

relates to

IGBF-3498 Review SRA Submissions

Closed

IGBF-3499 Complete evaluation of RNA-Seq dataset submissions

Closed

IGBF-3544 Re-run mature pollen and seedling pipeline with SL4 using data newly submitted to SRA

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Molly Davis added a comment - 29/Aug/23 3:25 PM - edited

Re-run Directory:/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL5
Prefetch SRR Script:

#! /bin/bash

#SBATCH --job-name=prefetch_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00

cd /projects/tomato_genome/fnb/dataprocessing/SRP438952
module load sra-tools/2.11.0
vdb-config --interactive

files=(
SRR24685698
SRR24685699
SRR24685700
SRR24685701
SRR24685702
SRR24685703
SRR24685704
SRR24685705
SRR24685706
SRR24685707
SRR24685708
SRR24685709
SRR24685710
SRR24685711
SRR24685712
SRR24685713
SRR24685714
SRR24685715
SRR24685716
SRR24685717
SRR24685718
SRR24685719
SRR24685720
SRR24685721
SRR24685722
SRR24685723
SRR24685724
SRR24685725
SRR24685726
SRR24685727
SRR24685728
SRR24685729
SRR24685730
SRR24685731
SRR24685732
SRR24685733
SRR24685734
SRR24685735
SRR24685736
SRR24685737
SRR24685738
SRR24685739
SRR24685740
SRR24685741
SRR24685742
SRR24685743
SRR24685744
SRR24685745
)

for f in "${files[@]}"; do echo $f; prefetch $f;  done

Execute:

chmod u+x prefetch.slurm

sbatch prefetch.slurm

Show

Molly Davis added a comment - 29/Aug/23 3:25 PM - edited Re-run Directory :/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL5 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP438952 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR24685698 SRR24685699 SRR24685700 SRR24685701 SRR24685702 SRR24685703 SRR24685704 SRR24685705 SRR24685706 SRR24685707 SRR24685708 SRR24685709 SRR24685710 SRR24685711 SRR24685712 SRR24685713 SRR24685714 SRR24685715 SRR24685716 SRR24685717 SRR24685718 SRR24685719 SRR24685720 SRR24685721 SRR24685722 SRR24685723 SRR24685724 SRR24685725 SRR24685726 SRR24685727 SRR24685728 SRR24685729 SRR24685730 SRR24685731 SRR24685732 SRR24685733 SRR24685734 SRR24685735 SRR24685736 SRR24685737 SRR24685738 SRR24685739 SRR24685740 SRR24685741 SRR24685742 SRR24685743 SRR24685744 SRR24685745 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm

Hide

Permalink

Molly Davis added a comment - 30/Aug/23 10:15 AM - edited

Faster Dump Script:

#! /bin/bash

#SBATCH --job-name=fastqdump_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=40gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00
#SBATCH --array=1-48

#setting up where to grab files from
file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP438952/Sra_ids.txt)


cd /projects/tomato_genome/fnb/dataprocessing/SRP438952
module load sra-tools/2.11.0

echo "Starting faster-qdump on $file";

cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/$file

fasterq-dump ${file}.sra

perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq

cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_1.fastq
cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_2.fastq 

echo "finished"

Execute:

chmod u+x fasterdump.slurm

sbatch fasterdump.slurm

Show

Molly Davis added a comment - 30/Aug/23 10:15 AM - edited Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-48 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP438952/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP438952 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm

Hide

Permalink

Molly Davis added a comment - 05/Sep/23 11:08 AM - edited

Nextflow Pipeline ran successfully with SL5 genome
Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952
MultiQC report notes: No errors or warnings were present in the report. The output file is named 'SRP438952_multiqc_report.html'.

Cluster Note: I had to email OneIT UNCC due to issues with logging in on the cluster. Their response:

We narrowed this issue down to one of our interactive nodes, str-i2, which was unusually overloaded. We rebooted it, and things seem to be back to normal. Please try your login again, and let me know if the issue persists.

Show

Molly Davis added a comment - 05/Sep/23 11:08 AM - edited Nextflow Pipeline ran successfully with SL5 genome Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952 MultiQC report notes: No errors or warnings were present in the report. The output file is named 'SRP438952_multiqc_report.html'. Cluster Note : I had to email OneIT UNCC due to issues with logging in on the cluster. Their response: We narrowed this issue down to one of our interactive nodes, str-i2, which was unusually overloaded. We rebooted it, and things seem to be back to normal. Please try your login again, and let me know if the issue persists.

Hide

Permalink

Molly Davis added a comment - 05/Sep/23 1:43 PM - edited

Next steps:

Commit CSV and multiqc report to Splicing repo on bitbucket
Change sorted bam names
Create junction files
Create Coverage graphs

Show

Molly Davis added a comment - 05/Sep/23 1:43 PM - edited Next steps : Commit CSV and multiqc report to Splicing repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs

Hide

Permalink

Molly Davis added a comment - 05/Sep/23 1:45 PM

Launch renameBams.sh script:
./renameBams.sh
Launch Scaled Coverage graphs script:
./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
Launch Junction files script:
./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

Show

Molly Davis added a comment - 05/Sep/23 1:45 PM Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

Hide

Permalink

Molly Davis added a comment - 06/Sep/23 3:46 PM - edited

Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/results/star_salmon

Reviewer:
Check that files have reasonable sizes (no "zero" size files, for example)
Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
Check that every bam file has a corresponding "FJ.bed.gz" file
Check that every bam file has a corresponding "scaled.bedgraph.gz" file
Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

Reviewer: [~RobertReid]

Show

Molly Davis added a comment - 06/Sep/23 3:46 PM - edited Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" Reviewer: [~RobertReid]

Hide

Permalink

Robert Reid added a comment - 11/Sep/23 2:14 PM

Checking things.

All of the file types have 48 total files as expected
There is a tbi file for every gz file.
The sizes of all the files are similar to one another the tbi index files.
The FJ.bed.gz files are similar in size at around 4MB +- 4.
The scaled bedgraphs are all 30MB to 130MB in size. Similar enough!
There is a total of 96 tbi files as expected (48 scaled and 48 FJ)
There is a total of 96 gz files as expected (48 scaled and 48 FJ)

All looks as expected!
Great job.

Show

Robert Reid added a comment - 11/Sep/23 2:14 PM Checking things. All of the file types have 48 total files as expected There is a tbi file for every gz file. The sizes of all the files are similar to one another the tbi index files. The FJ.bed.gz files are similar in size at around 4MB +- 4. The scaled bedgraphs are all 30MB to 130MB in size. Similar enough! There is a total of 96 tbi files as expected (48 scaled and 48 FJ) There is a total of 96 gz files as expected (48 scaled and 48 FJ) All looks as expected! Great job.

Hide

Permalink

Molly Davis added a comment - 11/Sep/23 2:53 PM - edited

Branch: https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3424
Includes:

MultiQC Report
CSV

[~aloraine]

Show

Molly Davis added a comment - 11/Sep/23 2:53 PM - edited Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3424 Includes: MultiQC Report CSV [~aloraine]

Hide

Permalink

Molly Davis added a comment - 18/Sep/23 9:58 AM

Pull Request: https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/12

Show

Molly Davis added a comment - 18/Sep/23 9:58 AM Pull Request : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/12

People

Assignee:

Molly Davis

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

29/Aug/23 11:36 AM

Updated:

18/Jan/24 2:22 PM

Resolved:

18/Sep/23 11:09 AM