Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3733

Parse Trinity/Blat results to identify contigs that can be annotated

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL: Devise a strategy to annotate the Trinity results generated in IGBF-3647.

      We have MANY contigs (170K or more per variety).

      Step 1 would be to identify how many contigs align to a corresponding soly gene (via our blat results).
      Step 2: How many trinity contigs fail to have a match?
      Step 3: How many contigs are promiscuous and have many gene matches?
      Step 4: How many Soly Ids have only 1 Trinity Contig match? (best hit recprical)

      From this we will decide if these associations are suitable to be used as annotations.

        Attachments

          Issue Links

            Activity

            Hide
            robofjoy Robert Reid added a comment - - edited

            Python script to do this is located on cluster here:

            /projects/tomato_genome/fnb/dataprocessing/trinity/identifyBlatMatches.py

            Script first reads in trinity contigs file to get all of the Identifiers for each sequence.
            Script will read in blat result and parse it into an object.

            Walk through these to see for every contig, how many blat hits do we get.
            How many contigs lack an associtaion with a SOLyID?

            At quick glance in Heinz we see that many contigs will have no hits.
            /projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ awk '

            { print $10 }

            ' blat-heinz.psl | sort | uniq | wc -l
            54109

            And some blat hits will have multiple partners as well.

            rreid2@str-i2:/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ wc -l blat-heinz.psl
            112437 blat-heinz.psl

            Show
            robofjoy Robert Reid added a comment - - edited Python script to do this is located on cluster here: /projects/tomato_genome/fnb/dataprocessing/trinity/identifyBlatMatches.py Script first reads in trinity contigs file to get all of the Identifiers for each sequence. Script will read in blat result and parse it into an object. Walk through these to see for every contig, how many blat hits do we get. How many contigs lack an associtaion with a SOLyID? At quick glance in Heinz we see that many contigs will have no hits. /projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ awk ' { print $10 } ' blat-heinz.psl | sort | uniq | wc -l 54109 And some blat hits will have multiple partners as well. rreid2@str-i2:/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ wc -l blat-heinz.psl 112437 blat-heinz.psl
            Hide
            robofjoy Robert Reid added a comment -

            Since blat runs quickly, let's test and make sure that the simple blat script works correctly.
            Assign to Brandon to run these. 1 of 4 varieties completed. Waiting on Trinity runs to finish.

            Once it does, we can close this ticket and a new one will be created that will implement Ann's suggestions for blat results to work with IGB viewing.

            Show
            robofjoy Robert Reid added a comment - Since blat runs quickly, let's test and make sure that the simple blat script works correctly. Assign to Brandon to run these. 1 of 4 varieties completed. Waiting on Trinity runs to finish. Once it does, we can close this ticket and a new one will be created that will implement Ann's suggestions for blat results to work with IGB viewing.
            Hide
            robofjoy Robert Reid added a comment -

            The BLAT runs were successful but pointless for now as we are going to go back 1 step and make new contigs using RNA-SPADES2. New ticket for blat will be created once that step is complete.

            Show
            robofjoy Robert Reid added a comment - The BLAT runs were successful but pointless for now as we are going to go back 1 step and make new contigs using RNA-SPADES2. New ticket for blat will be created once that step is complete.

              People

              • Assignee:
                bbendick Brandon Bendickson
                Reporter:
                robofjoy Robert Reid
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: