Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Spring 10, Summer 1
Description
GOAL: Devise a strategy to annotate the Trinity results generated in IGBF-3647.
We have MANY contigs (170K or more per variety).
Step 1 would be to identify how many contigs align to a corresponding soly gene (via our blat results).
Step 2: How many trinity contigs fail to have a match?
Step 3: How many contigs are promiscuous and have many gene matches?
Step 4: How many Soly Ids have only 1 Trinity Contig match? (best hit recprical)
From this we will decide if these associations are suitable to be used as annotations.
Attachments
Issue Links
- blocks
-
IGBF-3732 Prepare Google slide deck outlining reference free pipeline + results(Kelsey)
-
- Closed
-
Python script to do this is located on cluster here:
/projects/tomato_genome/fnb/dataprocessing/trinity/identifyBlatMatches.py
Script first reads in trinity contigs file to get all of the Identifiers for each sequence.
Script will read in blat result and parse it into an object.
Walk through these to see for every contig, how many blat hits do we get.
How many contigs lack an associtaion with a SOLyID?
At quick glance in Heinz we see that many contigs will have no hits.
{ print $10 }/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ awk '
' blat-heinz.psl | sort | uniq | wc -l
54109
And some blat hits will have multiple partners as well.
rreid2@str-i2:/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ wc -l blat-heinz.psl
112437 blat-heinz.psl