Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4215

Mapping SolyID to Kegg by way of RefSEq / NCBI via BLAST

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      So the problem is that with KEGG as we currently stand,

      we only have 4000 KEGG ids that map to about 8700 SL4 genes.

      Since there are 34,000 genes, we are leaving MANY genes out of consideration.

      However, there is a KEGG API mapping file that associates NCBI Ids for tomato to KEGG IDs. (see attached)

      Looks like so:
      100037489 NP_001234263
      100037490 NP_001234268
      100037491 NP_001234273
      100037492 NP_001234279
      100037493 NP_001234283
      100037494 NP_001234289
      100037498 NP_001234310
      100037495 NP_001234471
      100037496 XP_004230843

      Ultimately we need a salmon counts table that is NORMALIZED !! And has a column with the KEGG ids, (1st column).

      HERE:
      Download all of the NCBI IDs from column 2.
      Blast Soly genes to these IDs and make a new table with the top hit.
      Summarize how many hits do we get, how many misses also.

        Attachments

        1. soly_mapping.tsv
          1021 kB
        2. kegg-api-sly.txt
          592 kB
        3. best_hits_kegg.tsv
          1.43 MB

          Activity

            People

            • Assignee:
              bbendick Brandon Bendickson
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: