[IGBF-3544] Re-run mature pollen and seedling pipeline with SL4 using data newly submitted to SRA - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Support NSF pollen grant
Sprint:
Spring 1, Spring 2, Spring 3, Spring 4

Description

SRP438952

For this task, we need to confirm and sanity-check the seedling and mature pollen data that Rob recently uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:

Check SRP on NCBI and review submission
Download the data onto the cluster by using the SRP name
Run nf-core/rnaseq pipeline
Run our coverage graph and junctions scripts on the data

Note that all files should now use their "SRR" names instead of the existing file names.

Attachments

Issue Links

relates to

IGBF-3613 Run nextflow with the original mature pollen and seedlings data with SL4

In Progress

IGBF-3424 Re-run mature pollen and seedling pipeline with SL5 using data newly submitted to SRA

Closed

IGBF-3498 Review SRA Submissions

Closed

IGBF-3499 Complete evaluation of RNA-Seq dataset submissions

Closed

Activity

Ascending order - Click to sort in descending order

Molly Davis created issue - 17/Jan/24 2:48 PM

Molly Davis made changes - 17/Jan/24 2:48 PM

Field	Original Value	New Value
Epic Link		IGBF-2993 [ 21429 ]

Molly Davis made changes - 17/Jan/24 2:48 PM

Link

This issue relates to ~~IGBF-3424~~ [ ~~IGBF-3424~~ ]

Molly Davis made changes - 17/Jan/24 2:48 PM

Link

This issue relates to ~~IGBF-3498~~ [ ~~IGBF-3498~~ ]

Molly Davis made changes - 17/Jan/24 2:48 PM

Link

This issue relates to ~~IGBF-3499~~ [ ~~IGBF-3499~~ ]

Molly Davis made changes - 17/Jan/24 2:48 PM

Rank

Ranked higher

Molly Davis made changes - 18/Jan/24 10:20 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Hide

Permalink

Molly Davis added a comment - 18/Jan/24 2:23 PM

Re-run Directory:/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
Prefetch SRR Script:

#! /bin/bash

#SBATCH --job-name=prefetch_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00

cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
module load sra-tools/2.11.0
vdb-config --interactive

files=(
SRR24685698
SRR24685699
SRR24685700
SRR24685701
SRR24685702
SRR24685703
SRR24685704
SRR24685705
SRR24685706
SRR24685707
SRR24685708
SRR24685709
SRR24685710
SRR24685711
SRR24685712
SRR24685713
SRR24685714
SRR24685715
SRR24685716
SRR24685717
SRR24685718
SRR24685719
SRR24685720
SRR24685721
SRR24685722
SRR24685723
SRR24685724
SRR24685725
SRR24685726
SRR24685727
SRR24685728
SRR24685729
SRR24685730
SRR24685731
SRR24685732
SRR24685733
SRR24685734
SRR24685735
SRR24685736
SRR24685737
SRR24685738
SRR24685739
SRR24685740
SRR24685741
SRR24685742
SRR24685743
SRR24685744
SRR24685745
)

for f in "${files[@]}"; do echo $f; prefetch $f;  done

Execute:

chmod u+x prefetch.slurm

sbatch prefetch.slurm

Show

Molly Davis added a comment - 18/Jan/24 2:23 PM Re-run Directory :/projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR24685698 SRR24685699 SRR24685700 SRR24685701 SRR24685702 SRR24685703 SRR24685704 SRR24685705 SRR24685706 SRR24685707 SRR24685708 SRR24685709 SRR24685710 SRR24685711 SRR24685712 SRR24685713 SRR24685714 SRR24685715 SRR24685716 SRR24685717 SRR24685718 SRR24685719 SRR24685720 SRR24685721 SRR24685722 SRR24685723 SRR24685724 SRR24685725 SRR24685726 SRR24685727 SRR24685728 SRR24685729 SRR24685730 SRR24685731 SRR24685732 SRR24685733 SRR24685734 SRR24685735 SRR24685736 SRR24685737 SRR24685738 SRR24685739 SRR24685740 SRR24685741 SRR24685742 SRR24685743 SRR24685744 SRR24685745 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm

Hide

Permalink

Molly Davis added a comment - 19/Jan/24 10:16 AM

Faster Dump Script:

#! /bin/bash

#SBATCH --job-name=fastqdump_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=40gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00
#SBATCH --array=1-48

#setting up where to grab files from
file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/Sra_ids.txt)


cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
module load sra-tools/2.11.0

echo "Starting faster-qdump on $file";

cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/$file

fasterq-dump ${file}.sra

perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq

cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_1.fastq
cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_2.fastq 

echo "finished"

Execute:

chmod u+x fasterdump.slurm

sbatch fasterdump.slurm

Show

Molly Davis added a comment - 19/Jan/24 10:16 AM Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-48 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm

Ann Loraine made changes - 20/Jan/24 4:45 PM

Sprint

Spring 1 [ 185 ]

Spring 1, Spring 2 [ 185, 186 ]

Ann Loraine made changes - 20/Jan/24 4:45 PM

Rank

Ranked higher

Hide

Permalink

Molly Davis added a comment - 24/Jan/24 9:57 AM - edited

Nextflow Pipeline ran successfully with SL4 genome
Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4
MultiQC report notes: No errors or warnings were present in the report. The strandedness did need to be changed to 'unstranded' instead of 'reverse' for SL4 it is saying but when you do change it to unstranded it says it need to be in reverse so I am keeping it at unstranded which did work for SL5. The output file is named 'SRP438952_SL4_multiqc_report.html'.

Show

Molly Davis added a comment - 24/Jan/24 9:57 AM - edited Nextflow Pipeline ran successfully with SL4 genome Directory : /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4 MultiQC report notes : No errors or warnings were present in the report. The strandedness did need to be changed to 'unstranded' instead of 'reverse' for SL4 it is saying but when you do change it to unstranded it says it need to be in reverse so I am keeping it at unstranded which did work for SL5. The output file is named 'SRP438952_SL4_multiqc_report.html'.

Hide

Permalink

Molly Davis added a comment - 24/Jan/24 3:50 PM

Next steps:
Commit multiqc report to Splicing repo on bitbucket
Change sorted bam names
Create junction files
Create Coverage graphs

Show

Molly Davis added a comment - 24/Jan/24 3:50 PM Next steps : Commit multiqc report to Splicing repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs

Hide

Permalink

Molly Davis added a comment - 24/Jan/24 3:50 PM

Launch renameBams.sh script:
./renameBams.sh
Launch Scaled Coverage graphs script:
./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
Launch Junction files script:
./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

Show

Molly Davis added a comment - 24/Jan/24 3:50 PM Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

Hide

Permalink

Molly Davis added a comment - 25/Jan/24 3:52 PM - edited

Directory: /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon

Reviewer:
Check that files have reasonable sizes (no "zero" size files, for example)
Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
Check that every bam file has a corresponding "FJ.bed.gz" file
Check that every bam file has a corresponding "scaled.bedgraph.gz" file
Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

Show

Molly Davis added a comment - 25/Jan/24 3:52 PM - edited Directory : /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

Molly Davis made changes - 25/Jan/24 3:53 PM

Assignee

Molly Davis [ molly ]

Molly Davis made changes - 25/Jan/24 3:53 PM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Molly Davis made changes - 25/Jan/24 3:54 PM

Assignee

Robert Reid [ robertreid ]

Hide

Permalink

Robert Reid added a comment - 26/Jan/24 9:46 AM

The folder does indeed exist!

ll *bed.gz | wc -l
48

We have 48 experiments coinciding with what is in the SRA.

Running these commands:
ll *bed.gz.tbi
988 ll *bed.gz.tbi | wc -l
989 ll *bed.gz | wc -l
991 ll *bam | wc -l
992 ll *bai | wc -l
995 cat *.err
996 ll *err
997 ll *bam
998 ll *bai
1000 ll *.bed.gz

(I have an alias ll that equald ll='ls -lrt'). So the above commands use this alias.

A weird warning!!
WARNING: BAM index file /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam.bai is older than BAM /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam

I don't think this is anything. But it might be good to test a few of these in samples in IGB to ensure everything is ok.
More than likely the .bai file was produced so quick, it gets a timestamp prior to the actual bam file.

All of the bam and bai files are the size as expected. So the warning above is likely nothing!!

And everything else looks as it should.

Show

Robert Reid added a comment - 26/Jan/24 9:46 AM The folder does indeed exist! ll *bed.gz | wc -l 48 We have 48 experiments coinciding with what is in the SRA. Running these commands: ll *bed.gz.tbi 988 ll *bed.gz.tbi | wc -l 989 ll *bed.gz | wc -l 991 ll *bam | wc -l 992 ll *bai | wc -l 995 cat *.err 996 ll *err 997 ll *bam 998 ll *bai 1000 ll *.bed.gz (I have an alias ll that equald ll='ls -lrt'). So the above commands use this alias. A weird warning!! WARNING: BAM index file /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam.bai is older than BAM /projects/tomato_genome/fnb/dataprocessing/SRP438952/nfcore-SL4/results/star_salmon/SRR24685736.bam I don't think this is anything. But it might be good to test a few of these in samples in IGB to ensure everything is ok. More than likely the .bai file was produced so quick, it gets a timestamp prior to the actual bam file. All of the bam and bai files are the size as expected. So the warning above is likely nothing!! And everything else looks as it should.

Robert Reid made changes - 26/Jan/24 9:46 AM

Status

Needs 1st Level Review [ 10005 ]

First Level Review in Progress [ 10301 ]

Robert Reid made changes - 26/Jan/24 9:46 AM

Assignee

Robert Reid [ robertreid ]

Molly Davis [ molly ]

Molly Davis made changes - 26/Jan/24 9:50 AM

Status

First Level Review in Progress [ 10301 ]

Ready for Pull Request [ 10304 ]

Hide

Permalink

Molly Davis added a comment - 26/Jan/24 3:38 PM

Branch: https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3544
PR: https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/14

Show

Molly Davis added a comment - 26/Jan/24 3:38 PM Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3544 PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/14

Molly Davis made changes - 26/Jan/24 3:38 PM

Assignee

Molly Davis [ molly ]

Molly Davis made changes - 26/Jan/24 3:38 PM

Status

Ready for Pull Request [ 10304 ]

Pull Request Submitted [ 10101 ]

Molly Davis made changes - 26/Jan/24 3:38 PM

Assignee

Ann Loraine [ aloraine ]

Ann Loraine made changes - 02/Feb/24 6:27 PM

Sprint

Spring 1, Spring 2 [ 185, 186 ]

Spring 1, Spring 2, Spring 3 [ 185, 186, 187 ]

Ann Loraine made changes - 02/Feb/24 6:27 PM

Rank

Ranked higher

Hide

Permalink

Ann Loraine added a comment - 16/Feb/24 10:16 AM

Suggestions for testing:

Check that these new quality control reports are consistent with the original quality control reports obtained when we processed the original, pre-submission data files. The results should be the same.

Show

Ann Loraine added a comment - 16/Feb/24 10:16 AM Suggestions for testing: Check that these new quality control reports are consistent with the original quality control reports obtained when we processed the original, pre-submission data files. The results should be the same.

Ann Loraine made changes - 16/Feb/24 10:16 AM

Status

Pull Request Submitted [ 10101 ]

Reviewing Pull Request [ 10303 ]

Ann Loraine made changes - 16/Feb/24 10:16 AM

Assignee

Ann Loraine [ aloraine ]

Molly Davis [ molly ]

Hide

Permalink

Ann Loraine added a comment - 16/Feb/24 10:18 AM

Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not?

If not, it would be good to do that now.

The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, the QC report will likely show it.

Show

Ann Loraine added a comment - 16/Feb/24 10:18 AM Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not? If not, it would be good to do that now. The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, the QC report will likely show it.

Ann Loraine made changes - 16/Feb/24 10:18 AM

Assignee

Molly Davis [ molly ]

Ann Loraine made changes - 16/Feb/24 10:19 AM

Status

Reviewing Pull Request [ 10303 ]

Merged Needs Testing [ 10002 ]

Molly Davis made changes - 16/Feb/24 2:26 PM

Assignee

Molly Davis [ molly ]

Ann Loraine made changes - 19/Feb/24 9:21 AM

Sprint

Spring 1, Spring 2, Spring 3 [ 185, 186, 187 ]

Spring 1, Spring 2, Spring 3, Spring 4 [ 185, 186, 187, 188 ]

Ann Loraine made changes - 19/Feb/24 9:21 AM

Rank

Ranked higher

Molly Davis made changes - 19/Feb/24 3:48 PM

Status

Merged Needs Testing [ 10002 ]

Post-merge Testing In Progress [ 10003 ]

Hide

Permalink

Molly Davis added a comment - 19/Feb/24 4:13 PM

The original mature pollen and seedlings data on the cluster was not run with SL4, only SL5. So the next step would be to run the original data with SL4 and then compare the rerun SL4 report with the original report.

Show

Molly Davis added a comment - 19/Feb/24 4:13 PM The original mature pollen and seedlings data on the cluster was not run with SL4, only SL5. So the next step would be to run the original data with SL4 and then compare the rerun SL4 report with the original report.

Molly Davis made changes - 19/Feb/24 4:13 PM

Status

Post-merge Testing In Progress [ 10003 ]

Merged Needs Testing [ 10002 ]

Molly Davis made changes - 19/Feb/24 4:17 PM

Link

This issue relates to IGBF-3613 [ IGBF-3613 ]

Hide

Permalink

Molly Davis added a comment - 23/Feb/24 5:25 PM

When I make the comparison ticket for mature pollen and seedling I will compare the mutliqc reports then so for now I will move this ticket to Done!

Show

Molly Davis added a comment - 23/Feb/24 5:25 PM When I make the comparison ticket for mature pollen and seedling I will compare the mutliqc reports then so for now I will move this ticket to Done!

Molly Davis made changes - 23/Feb/24 5:25 PM

Status

Merged Needs Testing [ 10002 ]

Post-merge Testing In Progress [ 10003 ]

Molly Davis made changes - 23/Feb/24 5:25 PM

Resolution		Done [ 10000 ]
Status	Post-merge Testing In Progress [ 10003 ]	Closed [ 6 ]

People

Assignee:

Molly Davis

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

17/Jan/24 2:48 PM

Updated:

23/Feb/24 5:25 PM

Resolved:

23/Feb/24 5:25 PM