Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:4
-
Sprint:Spring 9, Spring 10, Summer 1, Summer 2, Summer 3, Summer 4
Description
Download the data sets identified in the linked ticket into the "data processing" directory on the cluster.
Instructions:
1) For each "SRP" accession (the identifier that uniquely identifies a data set), create a folder with the same name as the dataset. For example, if the dataset accession is SRP123456, make a folder named SRP123456. Make sure the folder is group-writeable so that other people in the group can add files to it. (By default, it will likely already be group-executable and group-readable, but please check it just in case.)
2) Use the "run selector" tool to make a meta-data file listing all the "runs" (SRR id's) plus library names, sample names, etc. for each "SRP" accession. Add it to the data set's folder.
3) Download fastq files and save them to a subfolder named "fastq" within each "SRP" folder.
4) Check that the data downloaded correctly. Talk with Molly and Rob Reid about how best to do that. Also, do some research (google, biostars, etc) for fast and easy ways to check that the download process did not fail. Another possibility: run fastQC on the data and compare the output to the meta-data file from the run selector tool? If you notice any problems, fix them.
5) Compress all the fastq files using gzip.
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Epic Link | IGBF-1395 [ 17470 ] |
Sprint | Spring 8 [ 192 ] | Spring 9 [ 193 ] |
Assignee | Paige Kulzer [ pkulzer ] |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | To-Do [ 10305 ] |
Sprint | Spring 9 [ 193 ] | Spring 9, Spring 10 [ 193, 194 ] |
Rank | Ranked higher |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Assignee | Paige Kulzer [ pkulzer ] | Ann Loraine [ aloraine ] |
Summary | Download tardigrade RNA-Seq onto cluster | Download tardigrade RNA-Seq data onto cluster |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Needs 1st Level Review [ 10005 ] |
Sprint | Spring 9, Spring 10 [ 193, 194 ] | Spring 9, Spring 10, Summer 1 [ 193, 194, 195 ] |
Rank | Ranked higher |
Sprint | Spring 9, Spring 10, Summer 1 [ 193, 194, 195 ] | Spring 9, Spring 10, Summer 1, Summer 2 [ 193, 194, 195, 196 ] |
Rank | Ranked higher |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
Sprint | Spring 9, Spring 10, Summer 1, Summer 2 [ 193, 194, 195, 196 ] | Spring 9, Spring 10, Summer 1, Summer 2, Summer 3 [ 193, 194, 195, 196, 197 ] |
Rank | Ranked higher |
Sprint | Spring 9, Spring 10, Summer 1, Summer 2, Summer 3 [ 193, 194, 195, 196, 197 ] | Spring 9, Spring 10, Summer 1, Summer 2, Summer 3, Summer 4 [ 193, 194, 195, 196, 197, 198 ] |
Rank | Ranked higher |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
Resolution | Done [ 10000 ] | |
Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
The "data processing" directory on the cluster can be found here: /projects/tomato_genome/fnb/dataprocessing
Most scripts can be found here: /projects/tomato_genome/fnb/scripts/flavonoid-rnaseq/src
Here is the entire workflow for processing the tardigrade data:
Based on the mastersheet in the linked ticket, there should be 10 "SRP" folders at the end of this process (11 if we're including the experiment with only an ERR accession).