Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Fall 7 2022 Nov 21, Fall 8 2022 Dec 5, Spring 1 2023 Dec 26, Spring 2 2023 Jan 16, Spring 3 2023 Feb 1, Spring 4 2023 Feb 21
Attachments
Issue Links
Activity
SRP100604 and SRP268884 have been uploaded and fastq files created.
SRA links:
https://www.ncbi.nlm.nih.gov/sra?term=SRP100604
https://www.ncbi.nlm.nih.gov/sra?term=SRP268884
Code Example
prefetch.slurm
#! /bin/bash #SBATCH --job-name=prefetch_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=4gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 cd /nobackup/tomato_genome/alt_splicing/SRP100604 module load sra-tools/2.11.0 vdb-config --interactive files=( SRR5279858 SRR5279875 SRR5279883 SRR5280323 SRR5280370 SRR5280382 SRR5280383 SRR5280392 SRR5282476 SRR5282478 SRR5282480 SRR5282481 ) for f in "${files[@]}"; do echo $f; prefetch $f; done
fasterdump.slurm
#! /bin/bash #SBATCH --job-name=fastqdump_SRR #SBATCH --partition=Orion #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem=40gb #SBATCH --output=%x_%j.out #SBATCH --time=24:00:00 #SBATCH --array=1-12 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/alt_splicing/SRP100604/Sra_ids.txt) cd /nobackup/tomato_genome/alt_splicing/SRP100604 module load sra-tools/2.11.0 echo "Starting faster-qdump on $file"; cd /nobackup/tomato_genome/alt_splicing/SRP100604/$file fasterq-dump ${file}.sra perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_1.fastq ${file}_2.fastq cp ${file}_1.fastq /nobackup/tomato_genome/alt_splicing/SRP100604/${file}_1.fastq cp ${file}_2.fastq /nobackup/tomato_genome/alt_splicing/SRP100604/${file}_2.fastq echo "finished"
Comments on results
Directory: /nobackup/tomato_genome/alt_splicing/SRP100604
SRP100604: There were some SRR files that were not double stranded but were single stranded so it could not make _1.fastq and _2.fastq files.
List of those SRR's-
SRR5282476
SRR5282478
SRR5282480
SRR5282481
Directory: /nobackup/tomato_genome/alt_splicing/SRP268884
SRP268884: Produces all double stranded _1.fastq and _2.fastq files.
Next Step: Run Nextflow rnaseq/nf-core pipeline on SRP268884.
Question: Should we still use SRP100604 if it contains single stranded SRR files or just use the double stranded files that it contained?
[~aloraine]
aloraine's answer to the above query: Go ahead and use all the available data in SRP100604. I believe that nextflow is able to handle this complication intelligently. I think you can omit the "second" file name in the "samples" file for single end runs. (Please note that "single strand" is not the same thing as "single end" - make sure that we are talking about the same thing before proceeding.)
All files are made, transferred to RENCI for hosting. Moving to DONE.