Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Sprint:Fall 7, Spring 1, Spring 2, Spring 3, Spring 4
Description
SRP460750
Directory: /projects/tomato_genome/fnb/dataprocessing/SRP460750/
Only SL5 was rerun with the SRA data and SL4 needs to be run with data as well.
For this task, we need to confirm and sanity-check the muday time course data that Rob recently uploaded and submitted to the Sequence Read Archive.
If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
For this task:
- Check SRP on NCBI and review submission
- Download the data onto the cluster by using the SRP name
- Run nf-core/rnaseq pipeline
- Run our coverage graph and junctions scripts on the data
Note that all files should now use their "SRR" names instead of the existing file names.
Attachments
Issue Links
Activity
Re-run Directory: /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4
Prefetch SRR Script:
#! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR25478240 SRR25478241 SRR25478242 SRR25478243 SRR25478244 SRR25478245 SRR25478246 SRR25478247 SRR25478248 SRR25478249 SRR25478250 SRR25478251 SRR25478252 SRR25478253 SRR25478254 SRR25478255 SRR25478256 SRR25478257 SRR25478258 SRR25478259 SRR25478260 SRR25478261 SRR25478262 SRR25478263 SRR25478264 SRR25478265 SRR25478266 SRR25478267 SRR25478268 SRR25478269 SRR25478270 SRR25478271 SRR25478272 SRR25478273 SRR25478274 SRR25478275 SRR25478276 SRR25478277 SRR25478278 SRR25478279 SRR25478280 SRR25478281 SRR25478282 SRR25478283 SRR25478284 SRR25478285 SRR25478286 SRR25478287 SRR25478288 SRR25478289 SRR25478290 SRR25478291 SRR25478292 SRR25478293 SRR25478294 SRR25478295 SRR25478296 SRR25478297 SRR25478298 SRR25478299 SRR25478300 SRR25478301 SRR25478302 SRR25478303 SRR25478304 SRR25478305 SRR25478306 SRR25478307 SRR25478308 SRR25478309 SRR25478310 SRR25478311 ) for f in "${files[@]}"; do echo $f; prefetch $f; done
Execute:
chmod u+x prefetch.slurm
sbatch prefetch.slurm
Faster Dump Script:
#! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-72 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/Sra_ids.txt) cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file"; cd /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_1.fastq cp ${file}_2.fastq /projects/tomato_genome/fnb/dataprocessing/SRP460750/nfcore-SL4/${file}_2.fastq echo "finished"
Execute:
chmod u+x fasterdump.slurm
sbatch fasterdump.slurm
Testing:
Moving to done!