Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:3
-
Epic Link:
-
Sprint:Fall 6, Fall 7, Spring 1
Description
Re-run mark 2022 timeseries data with the name SRP441343 from SRA for both SL4 and SL5 genomes.
For this task, we need to confirm and sanity-check the mark 2022 time series data that Rob uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:
- Check SRP on NCBI and review submission
- Download the data onto the cluster by using the SRP name
- Run nf-core/rnaseq pipeline
- Run our coverage graph and junctions scripts on the data
Note that all files should now use their "SRR" names instead of the existing file names.
Attachments
Issue Links
Activity
Re-run Directory: /projects/tomato_genome/fnb/dataprocessing/SRP441343
SL4: /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4
SL5: /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL5
Prefetch SRR Script:
#! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR24836276 SRR24836277 SRR24836278 SRR24836279 SRR24836280 SRR24836281 SRR24836282 SRR24836283 SRR24836284 SRR24836285 SRR24836286 SRR24836287 SRR24836288 SRR24836289 SRR24836290 SRR24836291 SRR24836292 SRR24836293 SRR24836294 SRR24836295 SRR24836296 SRR24836297 SRR24836298 SRR24836299 SRR24836300 SRR24836301 SRR24836302 SRR24836303 SRR24836304 SRR24836305 SRR24836306 SRR24836307 SRR24836308 SRR24836309 SRR24836310 SRR24836311 SRR24836312 SRR24836313 SRR24836314 SRR24836315 SRR24836316 SRR24836317 SRR24836318 SRR24836319 SRR24836320 SRR24836321 SRR24836322 SRR24836323 SRR24836324 SRR24836325 SRR24836326 SRR24836327 SRR24836328 SRR24836329 ) for f in "${files[@]}"; do echo $f; prefetch $f; done
Execute:
chmod u+x prefetch.slurm
sbatch prefetch.slurm
Faster Dump Script:
#! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-54 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file"; cd /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP441343/nfcore-SL4/${file}_2.fastq echo "finished"
Execute:
chmod u+x fasterdump.slurm
sbatch fasterdump.slurm
Testing:
Next step: prepare data to be moved from the cluster to IGB quick load. Refer to
IGBF-3499Moving ticket to done!