Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Fall 3 2022 Sep 26, Fall 4 2022 Oct 10, Fall 5 2022 Oct 24
Description
Annotate the SL5 genes with RNA processing related functions.
Create a table that reports functions for each of the "SL" identifiers (transcript or gene name is fine)
Also, there can be a many-to-many relationship! For example, one gene or transcript (column 1) could have multiple functional assignments (column 2).
Predicted:
- SR protein annotations (most likely to be alternatively spliced in response to a treatment)
- RNA-binding proteins
- Protein components of the spliceosome
- snRNP RNAs (if possible)
References:
- Implementing a Rational and Consistent Nomenclature for Serine/Arginine-Rich Protein Splicing Factors (SR Proteins) in Plants = https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965536/
- Gene Ontology classifications could be used, would need probably to transfer these over from previous annotation releases
Attachments
Issue Links
- blocks
-
IGBF-3230 Visualize alternative splicing events
- Closed
SR Proteins
From Figure 1 of: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965536/
Figure 1 shows the various protein / gene symbol names from arabidopsis for the SR family.
To get a list of these genes, the image was copied into Google Images and translated into text.
The text was then copied into a Google Sheet and edited.
SR splicing factors Sheet:
https://docs.google.com/spreadsheets/d/1ScfcUkmf74G6eOoGGDV-3ZpQ0iqbOiVqF78SyEiPSoM/edit?usp=sharing
*A few of the entries were incorrect in the OCR step from Google lens / images and were manually corrected.
Now we have a list of arabid gene id’s! (and rice too)
Finding the relevant GO terms
To find the relevant GO terms, each of the 17 arabid ID’s were entered into TAIR’s online tool for GO term retrieval (Via PANTHER’s DB)
GO TOOL:
https://www.arabidopsis.org/tools/go_term_enrichment.jsp
Each ID found a suitable GO term. The GO term, and the link to the Panther result were recorded in the SR Splicing factors Google sheet.
( https://docs.google.com/spreadsheets/d/1ScfcUkmf74G6eOoGGDV-3ZpQ0iqbOiVqF78SyEiPSoM/edit?usp=sharing
)
Tomato genes that match…..
For 8 of the 17 Arabid SR genes, there is an already identified Sol Lyco ID that has been made (NOT Genome version SL5). See column SolyID.
These are included in the spread sheet. 9 do not have a match.
Finding matches via Reciprocal Best Hit Blasts
Goal: to see what high fidelity matches we can get by blasting the 17 aribid genes against the tomato SL5 protein sequences.
Pull the arabid sequences from TAIR and save as fasta on the cluster
Blast all soly protein sequences to these 17 seq. (blastx)
Blast the 17 against the soly proteins (tblastn).
Identify best matches somehow.
Identify 1:1, 1tomany and manyto1.
Step 1:
The 17 arabid sequence IDs for SR:
At1g09140
At1g02840
At3g49430
A14902430
At1g23860
At4g31580
At2g24590
At5g64200
At5g18810
At3g55460
At3g13570
At1g55310
At3g53500
At2g37340
At2g46610
At3g61860
At4g25500
At5g52040
These are fed into TAIR here:
https://www.arabidopsis.org/tools/bulk/sequences/
Cluster Location: /nobackup/tomato_genome/alt_splicing/SR-proteins
MaKE BLAST DB ON THE cluster:
module load blast
makeblastdb -in SRgenes-arabid.fna -input_type fasta -dbtype nucl
makeblastdb -in SRproteins.faa -input_type fasta -dbtype prot