Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:3
-
Epic Link:
-
Sprint:Fall 1, Fall 2
Description
GOAL: Prep a new table that harnesses the results from Ticket 3772, The salmon counts.
The first step is to make a new counts table but we add SolyIDs back in. We associate the Soly Ids with the de novo contigs via a previous BLAT alignment. This task is the
We take the 4 tables and a Soly ID table produced previously (ticket # I don't recall)
Make a python script that will read in all the data, and make a large table where each row is a SolyId gene, each column is an experiment.
Will need good column labels!
Attachments
Activity
| Field | Original Value | New Value |
|---|---|---|
| Epic Link | IGBF-2993 [ 21429 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Assignee | Robert Reid [ robertreid ] | Brandon Bendickson [ bbendick ] |
| Summary | Blend 4 salmon count tables into 1 and add SolyIds back | Adding SolyIds back to the NEXTFLOW de novo results via a Python Script |
| Description |
GOAL: Prep a new table that harnesses the results from Ticket 3772, The salmon counts. We take the 4 tables and a Soly ID table produced previously (ticket # I don't recall) Make a python script that will read in all the data, and make a large table where each row is a SolyId gene, each column is an experiment. Will need good column labels! |
GOAL: Prep a new table that harnesses the results from Ticket 3772, The salmon counts.
The first step is to make a new counts table but we add SolyIDs back in. We associate the Soly Ids with the de novo contigs via a previous BLAT alignment. This task is the We take the 4 tables and a Soly ID table produced previously (ticket # I don't recall) Make a python script that will read in all the data, and make a large table where each row is a SolyId gene, each column is an experiment. Will need good column labels! |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Assignee | Brandon Bendickson [ bbendick ] | Robert Reid [ robertreid ] |
| Assignee | Robert Reid [ robertreid ] | Brandon Bendickson [ bbendick ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Status | In Progress [ 3 ] | To-Do [ 10305 ] |
| Sprint | Fall 1 [ 202 ] | Fall 1, Fall 2 [ 202, 203 ] |
| Rank | Ranked higher |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
| Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
| Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
| Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
| Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
| Resolution | Done [ 10000 ] | |
| Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
This will be a 2 task process, both involving writing python scripts.
This task is step 1: Adding a SolyID to a salmon counts table using our BLAT results from many steps ago.
We run this script repeatedly, one for each plant variety.
1. We need the Blat result fna file where we have a blatted the rna=spades contigs to SL5.
That can be found in this location:
/projects/tomato_genome/fnb/dataprocessing/brandon_work/mal/malintka-spades/spades_blat/blat-SL5-CDS-malintka-bestLongHit.fna
We read this file into a dict with the NODE id as the key and the SolyID as the value pair.
2. We need the salmon gene count file for the same variety:
/projects/tomato_genome/fnb/dataprocessing/brandon_work/NEXTFLOW/start_fresh/Mal-run-2/results-3.14.0/star_salmon/salmon.merged.gene_counts.tsv
The first column in the table is the NodeID, we ignore the 2nd column and then we keep all of the remaining column of read counts.
We read a line, parse it, we check if the ID in 1st column is in our dict from above.
If so, we write out a line using the SolyID as the first column and then write out all of the remaining fields!
In the end we write out a table, each row has a solyID and all of the gene counts.
We then repeat this script but point at new plant variety (aka MAL, etc).
After that we move to next phase of merging the 4 tables into 1 (new ticket that is not yet created)!!!