[IGBF-3507] Re-run Nextflow Muday time course data with SL4 and data downloaded from SRA - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
1
Epic Link:
Support NSF pollen grant
Sprint:
Fall 7, Spring 1, Spring 2, Spring 3, Spring 4

Description

SRP460750

Directory: /projects/tomato_genome/fnb/dataprocessing/SRP460750/

Only SL5 was rerun with the SRA data and SL4 needs to be run with data as well.

For this task, we need to confirm and sanity-check the muday time course data that Rob recently uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:

Check SRP on NCBI and review submission
Download the data onto the cluster by using the SRP name
Run nf-core/rnaseq pipeline
Run our coverage graph and junctions scripts on the data

Note that all files should now use their "SRR" names instead of the existing file names.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

muday-144-SL4-multiqc_report.html
5.95 MB
19/Feb/24 3:40 PM
muday-144-SL5-multiqc_report.html
5.92 MB
19/Feb/24 3:40 PM

Issue Links

relates to

IGBF-3627 Create sample sheet for SRP460750

Closed

IGBF-3720 Re-run Nextflow Muday time course data again with SL5 and data downloaded from SRA

Closed

IGBF-3406 Re-run Nextflow Muday time course data with SL5 data downloaded from SRA

Closed

Activity

Descending order - Click to sort in ascending order

Hide

Permalink

Molly Davis added a comment - 19/Feb/24 3:40 PM - edited

Testing:

Compared Mulitqc reports from original muday SL4 data to this rerun SL4 report
The reports are not exactly the same mapping numbers but are close enough that there is no cause for concern. This could be because of sample switching in the original data because the report plots and averages have the same overall patterns.
I noticed that the original muday mutliqc reports for SL4 and SL5 are not in the flavonoid repo so I am going to add them to this ticket just incase because the data has sample switching which is why I guess we don't want them officially in the flavonoid repo.

Moving to done!

Show

Molly Davis added a comment - 19/Feb/24 3:40 PM - edited Testing : Compared Mulitqc reports from original muday SL4 data to this rerun SL4 report The reports are not exactly the same mapping numbers but are close enough that there is no cause for concern. This could be because of sample switching in the original data because the report plots and averages have the same overall patterns. I noticed that the original muday mutliqc reports for SL4 and SL5 are not in the flavonoid repo so I am going to add them to this ticket just incase because the data has sample switching which is why I guess we don't want them officially in the flavonoid repo. Moving to done!

Hide

Permalink

Ann Loraine added a comment - 16/Feb/24 10:19 AM

Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not?

If not, it would be good to do that now.

The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, or if something went wrong with the sample switching, the QC report will likely show it.

Show

Ann Loraine added a comment - 16/Feb/24 10:19 AM Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not? If not, it would be good to do that now. The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, or if something went wrong with the sample switching, the QC report will likely show it.

Hide

Permalink

Ann Loraine added a comment - 16/Feb/24 10:14 AM

Suggestions for testing:

Review the newly added reports and check for problems.
Compare this new report to the one we made for the "original" data files - are they consistent? The statistics for the new files should match the statistics for the old ones.
However, recall that the "original" data files have the "original" file names assigned by the sequencer. Later we learned that the files were mis-named. We we submitted the files to the SRA, we submitted them using corrected, revised sample names.

Show

Ann Loraine added a comment - 16/Feb/24 10:14 AM Suggestions for testing: Review the newly added reports and check for problems. Compare this new report to the one we made for the "original" data files - are they consistent? The statistics for the new files should match the statistics for the old ones. However, recall that the "original" data files have the "original" file names assigned by the sequencer. Later we learned that the files were mis-named. We we submitted the files to the SRA, we submitted them using corrected, revised sample names.

Hide

Permalink

Molly Davis added a comment - 26/Jan/24 3:44 PM

PR: https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/40

Show

Molly Davis added a comment - 26/Jan/24 3:44 PM PR : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/40

Hide

Permalink

Molly Davis added a comment - 25/Jan/24 4:07 PM

Branch: https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3507

Show

Molly Davis added a comment - 25/Jan/24 4:07 PM Branch : https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3507

5 older comments

Hide

Permalink

Molly Davis added a comment - 11/Dec/23 10:51 AM

Re-run Directory: /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4

Prefetch SRR Script:

#! /bin/bash

#SBATCH --job-name=prefetch_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00

cd  /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4
module load sra-tools/2.11.0
vdb-config --interactive

files=(
SRR25478240
SRR25478241
SRR25478242
SRR25478243
SRR25478244
SRR25478245
SRR25478246
SRR25478247
SRR25478248
SRR25478249
SRR25478250
SRR25478251
SRR25478252
SRR25478253
SRR25478254
SRR25478255
SRR25478256
SRR25478257
SRR25478258
SRR25478259
SRR25478260
SRR25478261
SRR25478262
SRR25478263
SRR25478264
SRR25478265
SRR25478266
SRR25478267
SRR25478268
SRR25478269
SRR25478270
SRR25478271
SRR25478272
SRR25478273
SRR25478274
SRR25478275
SRR25478276
SRR25478277
SRR25478278
SRR25478279
SRR25478280
SRR25478281
SRR25478282
SRR25478283
SRR25478284
SRR25478285
SRR25478286
SRR25478287
SRR25478288
SRR25478289
SRR25478290
SRR25478291
SRR25478292
SRR25478293
SRR25478294
SRR25478295
SRR25478296
SRR25478297
SRR25478298
SRR25478299
SRR25478300
SRR25478301
SRR25478302
SRR25478303
SRR25478304
SRR25478305
SRR25478306
SRR25478307
SRR25478308
SRR25478309
SRR25478310
SRR25478311

)

for f in "${files[@]}"; do echo $f; prefetch $f;  done

Execute:

chmod u+x prefetch.slurm

sbatch prefetch.slurm

Faster Dump Script:

#! /bin/bash

#SBATCH --job-name=fastqdump_SRR
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=40gb
#SBATCH --output=%x_%j.out
#SBATCH --time=24:00:00
#SBATCH --array=1-72

#setting up where to grab files from
file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/Sra_ids.txt)


cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4
module load sra-tools/2.11.0

echo "Starting faster-qdump on $file";

cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/$file

fasterq-dump ${file}.sra

perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq

cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_1.fastq
cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_2.fastq 

echo "finished"

Execute:

chmod u+x fasterdump.slurm

sbatch fasterdump.slurm

Show

Molly Davis added a comment - 11/Dec/23 10:51 AM Re-run Directory : /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4 Prefetch SRR Script : #! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR25478240 SRR25478241 SRR25478242 SRR25478243 SRR25478244 SRR25478245 SRR25478246 SRR25478247 SRR25478248 SRR25478249 SRR25478250 SRR25478251 SRR25478252 SRR25478253 SRR25478254 SRR25478255 SRR25478256 SRR25478257 SRR25478258 SRR25478259 SRR25478260 SRR25478261 SRR25478262 SRR25478263 SRR25478264 SRR25478265 SRR25478266 SRR25478267 SRR25478268 SRR25478269 SRR25478270 SRR25478271 SRR25478272 SRR25478273 SRR25478274 SRR25478275 SRR25478276 SRR25478277 SRR25478278 SRR25478279 SRR25478280 SRR25478281 SRR25478282 SRR25478283 SRR25478284 SRR25478285 SRR25478286 SRR25478287 SRR25478288 SRR25478289 SRR25478290 SRR25478291 SRR25478292 SRR25478293 SRR25478294 SRR25478295 SRR25478296 SRR25478297 SRR25478298 SRR25478299 SRR25478300 SRR25478301 SRR25478302 SRR25478303 SRR25478304 SRR25478305 SRR25478306 SRR25478307 SRR25478308 SRR25478309 SRR25478310 SRR25478311 ) for f in "${files[@]}" ; do echo $f; prefetch $f; done Execute : chmod u+x prefetch.slurm sbatch prefetch.slurm Faster Dump Script : #! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-72 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file" ; cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_2.fastq echo "finished" Execute : chmod u+x fasterdump.slurm sbatch fasterdump.slurm

Re-run Nextflow Muday time course data with SL4 and data downloaded from SRA

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates