Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Sprint:Spring 8
Description
So the problem is that with KEGG as we currently stand,
we only have 4000 KEGG ids that map to about 8700 SL4 genes.
Since there are 34,000 genes, we are leaving MANY genes out of consideration.
However, there is a KEGG API mapping file that associates NCBI Ids for tomato to KEGG IDs. (see attached)
Looks like so:
100037489 NP_001234263
100037490 NP_001234268
100037491 NP_001234273
100037492 NP_001234279
100037493 NP_001234283
100037494 NP_001234289
100037498 NP_001234310
100037495 NP_001234471
100037496 XP_004230843
Ultimately we need a salmon counts table that is NORMALIZED !! And has a column with the KEGG ids, (1st column).
HERE:
Download all of the NCBI IDs from column 2.
Blast Soly genes to these IDs and make a new table with the top hit.
Summarize how many hits do we get, how many misses also.
Attachments
Activity
| Field | Original Value | New Value |
|---|---|---|
| Epic Link | IGBF-2993 [ 21429 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Attachment | soly_mapping.tsv [ 18694 ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Attachment | best_hits_kegg.tsv [ 18695 ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
| Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
| Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
| Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
| Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
| Resolution | Done [ 10000 ] | |
| Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |