Details
-
Type: Task
-
Status: To-Do (View Workflow)
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
Description
Follow these directions from:
- Christian Grove cgrove@caltech.edu help@wormbase.org
to obtain best annnotations for C elegans:
snip
Hi Ann,
You can find the latest genome assemblies for C. elegans on our C. elegans page (discoverable under the WormBase "Directory" menu):
https://wormbase.org/species/c_elegans#32--10
As can be seen on the UCSC genome browser site:
https://genome.ucsc.edu/cgi-bin/hgGateway
and in our recent user guide book chapter:
https://www.ncbi.nlm.nih.gov/pubmed/29761466
The ce11 genome corresponds to the "WBcel235" genome assembly, which is our current assembly as indicated on our WB C. elegans page. You should then be able to download the GFF3 for that assembly from that same page in the "Downloads" widget. I would then recommend processing the GFF file to just pull out features coming from WormBase (column 2 = "WormBase"). Once you unzip the GFF3 file, you can run the following on the command line:
awk 'BEGIN
{FS="\t";OFS="\t"}{if($2 == "WormBase") print}
' c_elegans.PRJNA13758.WS274.annotations.gff3 > WormBase_elegans_GFF3_lines.txt
The remaining feature types (column 3) are then:
CDS
antisense_RNA
exon
five_prime_UTR
gene
intron
lincRNA
mRNA
miRNA
miRNA_primary_transcript
ncRNA
nc_primary_transcript
piRNA
pre_miRNA
pseudogenic_rRNA
pseudogenic_tRNA
pseudogenic_transcript
rRNA
scRNA
snRNA
snoRNA
tRNA
three_prime_UTR
You can filter the lines based on what you're looking for, specifically.
I hope that helps. Let us know if you have any questions.
Best,
Chris Grove
WormBase
end snip