[IGBF-3688] Rerun nextflow with ARE 120 minute Muday data - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
3
Epic Link:
Support NSF pollen grant
Sprint:
Spring 7, Spring 8

Description

SRP499796

Directory: /projects/tomato_genome/fnb/dataprocessing/SRP499796

SL4 and SL5 need to be run with this data set.

For this task, we need to confirm and sanity-check the ARE 120 minute flavonoid data that Rob recently uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:

Check SRP on NCBI and review submission
Download the data onto the cluster by using the SRP name
Run nf-core/rnaseq pipeline
Run our coverage graph and junctions scripts on the data

Note that all files should now use their "SRR" names instead of the existing file names.

Attachments

Issue Links

relates to

IGBF-3710 Make sample sheet for ARE 120 min data SRP499796

Closed

IGBF-3686 Resubmit ARE 120 Minute

Closed

Activity

Descending order - Click to sort in ascending order

Hide

Permalink

Ann Loraine added a comment - 26/Apr/24 4:09 PM

I reviewed the files as follows:

Checked that I could open the MultiQC files in my Web browser. I was able to open and review both.
Checked that the data are reported as "unstranded" in the run configuration file SRP499796.csv. They were.
Checked that RSeQC reported an about equal number of sense and antisense reads (with respect to gene models provided to the pipeline). It did.
However, I notice there is no sample sheet available in the repository for this data set. We will need to provide this to set up the data in IGB quickload. I made a ticket for it with more details.

Testing passes. Moving to DONE.

Show

Ann Loraine added a comment - 26/Apr/24 4:09 PM I reviewed the files as follows: Checked that I could open the MultiQC files in my Web browser. I was able to open and review both. Checked that the data are reported as "unstranded" in the run configuration file SRP499796.csv. They were. Checked that RSeQC reported an about equal number of sense and antisense reads (with respect to gene models provided to the pipeline). It did. However, I notice there is no sample sheet available in the repository for this data set. We will need to provide this to set up the data in IGB quickload. I made a ticket for it with more details. Testing passes. Moving to DONE.

Hide

Permalink

Ann Loraine added a comment - 24/Apr/24 11:39 AM

PR is merged. Moving to "ready for testing."

To test, review the files. If no problems observed, move forward to "Done."

Show

Ann Loraine added a comment - 24/Apr/24 11:39 AM PR is merged. Moving to "ready for testing." To test, review the files. If no problems observed, move forward to "Done."

Hide

Permalink

Molly Davis added a comment - 23/Apr/24 3:09 PM - edited

Branch: https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688
PR: https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/44

ARE-120min-analysis/SRP499796.csv
ARE-120min-analysis/SRP499796_SL4_multiqc_report.html
ARE-120min-analysis/SRP499796_SL5_multiqc_report.html

Show

Molly Davis added a comment - 23/Apr/24 3:09 PM - edited Branch : https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3688 PR : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/44 ARE-120min-analysis/SRP499796.csv ARE-120min-analysis/SRP499796_SL4_multiqc_report.html ARE-120min-analysis/SRP499796_SL5_multiqc_report.html

Hide

Permalink

Robert Reid added a comment - 23/Apr/24 11:26 AM

The SL4 Folder
The TSV files look complete.
All the bedgraphs are about 45MB in size and there are 24 files.
All the bed files are about 4.5MB in size and there are 24 files.
All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions.
Bams are 2.8 GB in size and there are 24.

The SL5 Folder
The TSV files look complete. 36K number of lines looks correct. for SL5.
All the bedgraphs are about 45MB in size and there are 24 files.
All the bed files are about 4.5MB in size and there are 24 files.
All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions.
Bams are 2.8 GB in size and there are 24.

This looks correct! Passing it back to Molly.

Show

Robert Reid added a comment - 23/Apr/24 11:26 AM The SL4 Folder The TSV files look complete. All the bedgraphs are about 45MB in size and there are 24 files. All the bed files are about 4.5MB in size and there are 24 files. All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions. Bams are 2.8 GB in size and there are 24. The SL5 Folder The TSV files look complete. 36K number of lines looks correct. for SL5. All the bedgraphs are about 45MB in size and there are 24 files. All the bed files are about 4.5MB in size and there are 24 files. All tbi files are ~ 70kb and there are 24 bedgraph versions and 24 bed versions. Bams are 2.8 GB in size and there are 24. This looks correct! Passing it back to Molly.

Hide

Permalink

Molly Davis added a comment - 22/Apr/24 3:10 PM - edited

Directories:
/projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/results/star_salmon
/projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5/results/star_salmon
Reviewer:
Check that files have reasonable sizes (no "zero" size files, for example)
Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
Check that every bam file has a corresponding "FJ.bed.gz" file
Check that every bam file has a corresponding "scaled.bedgraph.gz" file
Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

Show

Molly Davis added a comment - 22/Apr/24 3:10 PM - edited Directories : /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/results/star_salmon /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL5/results/star_salmon Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

4 older comments

Hide

Permalink

Molly Davis added a comment - 12/Apr/24 3:06 PM - edited

Re-run Directory: /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4

Prefetch SRR Script:


#! /bin/bash

#SBATCH --job-name=prefetch_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00

cd  /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4
module load sra-tools/2.11.0
vdb-config --interactive

files=(
SRR28558218
SRR28558219
SRR28558220
SRR28558221
SRR28558222
SRR28558223
SRR28558224
SRR28558225
SRR28558226
SRR28558227
SRR28558228
SRR28558229
SRR28558230
SRR28558231
SRR28558232
SRR28558233
SRR28558234
SRR28558235
SRR28558236
SRR28558237
SRR28558238
SRR28558239
SRR28558240
SRR28558241
)

for f in "${files[@]}"; do echo $f; prefetch $f;  done

Execute:

chmod u+x prefetch.slurm

sbatch prefetch.slurm

Faster Dump Script:

#! /bin/bash

#SBATCH --job-name=fastqdump_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=40gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00
#SBATCH --array=1-24

#setting up where to grab files from
file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/Sra_ids.txt)


cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4
module load sra-tools/2.11.0

echo "Starting faster-qdump on $file";

cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/$file

fasterq-dump ${file}.sra

perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq

cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_1.fastq
cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_2.fastq 

echo "finished"

Execute:

chmod u+x fasterdump.slurm

sbatch fasterdump.slurm

Show

Molly Davis added a comment - 12/Apr/24 3:06 PM - edited Re-run Directory : /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR28558218 SRR28558219 SRR28558220 SRR28558221 SRR28558222 SRR28558223 SRR28558224 SRR28558225 SRR28558226 SRR28558227 SRR28558228 SRR28558229 SRR28558230 SRR28558231 SRR28558232 SRR28558233 SRR28558234 SRR28558235 SRR28558236 SRR28558237 SRR28558238 SRR28558239 SRR28558240 SRR28558241 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-24 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP499796/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm

People

Assignee:

Molly Davis

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

09/Apr/24 3:20 PM

Updated:

26/Apr/24 4:09 PM

Resolved:

26/Apr/24 4:09 PM