Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Minor
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Sprint:Spring 8
Description
To ensure that the sequences are matching what we have versus what the SRA has, I came up with a simple check. We look for the first 10 sequences of a given SRA run across all the sequence files.
For speed, I head the fastq.gz file to get only the first 5 sequences.
Use zcat to keep files zipped.
Grep for the sequence of interest.
for f in *gz; do echo $f; zcat < $f | head -n 20 | grep "^CTGGCTTTTC" ; done
Probability dictates that we should find just 1 result.
Attachments
Issue Links
- blocks
-
IGBF-3683 Update SRA to use the correct sample codes for Muday lab time course data
-
- Closed
-
When succesful, results will look a little like this
.....
V.28.45.9_R2.fastq.gz
V.28.75.7_R1.fastq.gz
V.28.75.7_R2.fastq.gz
V.28.75.8_R1.fastq.gz
V.28.75.8_R2.fastq.gz
V.28.75.9_R1.fastq.gz
V.28.75.9_R2.fastq.gz
V.34.15.7_R1.fastq.gz
V.34.15.7_R2.fastq.gz
V.34.15.8_R1.fastq.gz
V.34.15.8_R2.fastq.gz
V.34.15.9_R1.fastq.gz
V.34.15.9_R2.fastq.gz
V.34.30.7_R1.fastq.gz
V.34.30.7_R2.fastq.gz
V.34.30.8_R1.fastq.gz
V.34.30.8_R2.fastq.gz
V.34.30.9_R1.fastq.gz
CTGGCTTTTCAGATTTCTCATCCCTGTATGCTTTTCTTCGAGGTGGAGACACCTTCGGCACCTTGTCCACTACATCAGCTGAACTTTGCAAATTGGTTGTCGAGTACAGTTTCTGACCAGCTGGAATGCTGTACGCATTCTTCACCTCAA
V.34.30.9_R2.fastq.gz
V.34.45.7_R1.fastq.gz
V.34.45.7_R2.fastq.gz
V.34.45.8_R1.fastq.gz
V.34.45.8_R2.fastq.gz
V.34.45.9_R1.fastq.gz
V.34.45.9_R2.fastq.gz
V.34.75.7_R1.fastq.gz
V.34.75.7_R2.fastq.gz
V.34.75.8_R1.fastq.gz
V.34.75.8_R2.fastq.gz
V.34.75.9_R2.fastq.gz
We see just the 1 result and this matches the SRA title and ID for this sample. (and this is one of the 16 samples that got switched!!)
Will spot test a few more.