Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Spring 10, Summer 1
Description
GOAL: Devise a strategy to annotate the Trinity results generated in IGBF-3647.
We have MANY contigs (170K or more per variety).
Step 1 would be to identify how many contigs align to a corresponding soly gene (via our blat results).
Step 2: How many trinity contigs fail to have a match?
Step 3: How many contigs are promiscuous and have many gene matches?
Step 4: How many Soly Ids have only 1 Trinity Contig match? (best hit recprical)
From this we will decide if these associations are suitable to be used as annotations.
Attachments
Issue Links
- blocks
-
IGBF-3732 Prepare Google slide deck outlining reference free pipeline + results(Kelsey)
-
- Closed
-
Activity
| Field | Original Value | New Value |
|---|---|---|
| Epic Link | IGBF-2993 [ 21429 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Assignee | Robert Reid [ robertreid ] | Brandon Benkick [ bbendick ] |
| Sprint | Spring 10 [ 194 ] | Spring 10, Summer 1 [ 194, 195 ] |
| Rank | Ranked higher |
| Description |
GOAL: Devise a strategy to annotate the Trinity results generated in We have MANY contigs (170K or more per variety). Step 1 would be to identify how many contigs align to a corresponding soly gene (via our blat results). Step 2: How many trinity contigs fail to have a match? Step 3: How many contigs are promiscuous and have many gene matches? Step 4: How many Soly Ids have only 1 Trinity Contig match? (best hit recprical) From this we will decide if these associations are suitable to be used as annotations. |
GOAL: Devise a strategy to annotate the Trinity results generated in We have MANY contigs (170K or more per variety). Step 1 would be to identify how many contigs align to a corresponding soly gene (via our blat results). Step 2: How many trinity contigs fail to have a match? Step 3: How many contigs are promiscuous and have many gene matches? Step 4: How many Soly Ids have only 1 Trinity Contig match? (best hit recprical) From this we will decide if these associations are suitable to be used as annotations. |
| Comment |
[ I found out that Trinity v2.13 has a bug that prevents the creation of the output fasta files. I rewrote the script to use Trinity v2.14.0, but it is in the queue for Draco. Trying a test run on Orion with Trinity v2.14.0, will run the others using Orion if this one completes successfully. Scripts are located in /projects/tomato_genome/fnb/dataprocessing/brandon_work/nag/temp_trinity/trinity-nag.slurm
Patch notes for trinity updates: https://github.com/trinityrnaseq/trinityrnaseq/releases ] |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
| Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
| Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
| Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
| Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
| Resolution | Done [ 10000 ] | |
| Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
Python script to do this is located on cluster here:
/projects/tomato_genome/fnb/dataprocessing/trinity/identifyBlatMatches.py
Script first reads in trinity contigs file to get all of the Identifiers for each sequence.
Script will read in blat result and parse it into an object.
Walk through these to see for every contig, how many blat hits do we get.
How many contigs lack an associtaion with a SOLyID?
At quick glance in Heinz we see that many contigs will have no hits.
{ print $10 }/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ awk '
' blat-heinz.psl | sort | uniq | wc -l
54109
And some blat hits will have multiple partners as well.
rreid2@str-i2:/projects/tomato_genome/fnb/dataprocessing/trinity/hei/blat$ wc -l blat-heinz.psl
112437 blat-heinz.psl