[IGBF-1401] Add Chlorocebus sabaeus ("green monkey") to IGB quickload - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
- Intermediate

Story Points:
1
Sprint:
Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2, Spring 2019 Sprint 3

Description

A user would like us to add green monkey (Chlorocebus sabaeus) to IGB Quickload.

We should get the sequence data (as usual) from UCSC but get the reference gene models from the GFF available from NCBI as he mentioned below. It would be more convenient to get the gene models from UCSC as usual, but I don't see them listed in the table browser. I think what may have happened is that NCBI has annotated the genome but UCSC has not yet imported the annotations into their database.

Attachments

Issue Links

relates to

HELP-306 New reference genome (Chlorocebus sabaeus)

Closed

IGBF-1522 Write program to convert EnsGene table bed12 to bed14

Closed

IGBF-1523 Add vervet (green monkey) to IGB species.txt and synonyms.txt files

Closed

HELP-306 New reference genome (Chlorocebus sabaeus)

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 10/Sep/18 3:47 PM - edited

genome sequence:

url: http://hgdownload.cse.ucsc.edu/goldenPath/chlSab2/bigZips/
file: chlSab2.2bit

Genome version name:

C_sabeus_Mar_2014

synonyms:

Chlorocebus_sabeus 1.1
chlSab2
Vervet Genomics Consortium GCA_000409795.2

Show

Ivory Blakley (Inactive) added a comment - 10/Sep/18 3:47 PM - edited genome sequence: url: http://hgdownload.cse.ucsc.edu/goldenPath/chlSab2/bigZips/ file: chlSab2.2bit Genome version name: C_sabeus_Mar_2014 synonyms: Chlorocebus_sabeus 1.1 chlSab2 Vervet Genomics Consortium GCA_000409795.2

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 11/Sep/18 1:30 PM - edited

The sequence file, and genome.txt, are set up and viewable here:
http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/

In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus.
I'm not sure why....tracking that.
-->resolved by adding species.txt file

Show

Ivory Blakley (Inactive) added a comment - 11/Sep/18 1:30 PM - edited The sequence file, and genome.txt, are set up and viewable here: http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/ In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus. I'm not sure why....tracking that. -->resolved by adding species.txt file

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 12/Sep/18 11:25 AM - edited

I'm having a hard time finding gene descriptions.

What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.

Show

Ivory Blakley (Inactive) added a comment - 12/Sep/18 11:25 AM - edited I'm having a hard time finding gene descriptions. What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.

Hide

Permalink

Ann Loraine added a comment - 08/Jan/19 3:10 PM - edited

Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation!

For this, we need to:

Deploy the latest green monkey assembly
Notify user (see related HELP issue for contact info) when it is deployed and ready.

*Note*: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly.

We also need to check that the documentation covers this aspect.

Show

Ann Loraine added a comment - 08/Jan/19 3:10 PM - edited Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation! For this, we need to: Deploy the latest green monkey assembly Notify user (see related HELP issue for contact info) when it is deployed and ready. * Note *: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly. We also need to check that the documentation covers this aspect.

Hide

Permalink

Ann Loraine added a comment - 08/Jan/19 3:11 PM

Please see related issue ~~HELP-306~~ for some history related to this issue.

Show

Ann Loraine added a comment - 08/Jan/19 3:11 PM Please see related issue HELP-306 for some history related to this issue.

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 08/Jan/19 5:28 PM

Created directory in shared DropBox
Added .2bit
Created genome.txt
D/L annotations from UCSC, track: Ensembl Genes
Will research options to BED conversion

Show

Jill Jenkins (Inactive) added a comment - 08/Jan/19 5:28 PM Created directory in shared DropBox Added .2bit Created genome.txt D/L annotations from UCSC, track: Ensembl Genes Will research options to BED conversion

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 11/Jan/19 4:37 PM - edited

Done:
BED format file was added.

To DO:
Write script to merge BED and Ensembl table
Find name to fill field 13, options: #1 gene symbol #2 ENS
Attributes: Gene Stable ID, Gene Description, Gene Name

Show

Jill Jenkins (Inactive) added a comment - 11/Jan/19 4:37 PM - edited Done: BED format file was added. To DO: Write script to merge BED and Ensembl table Find name to fill field 13, options: #1 gene symbol #2 ENS Attributes: Gene Stable ID, Gene Description, Gene Name

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 17/Jan/19 4:14 PM

I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output.

with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile:
for line in martfile:
toks = line.split('\t')
id1 = toks[0]
description = toks[1]
for line in bedfile:
toks1 = line.split('\t')
id2 = toks1[3]
if id1 == id2:
d[toks1] = [id1, description]

Show

Jill Jenkins (Inactive) added a comment - 17/Jan/19 4:14 PM I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output. with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile: for line in martfile: toks = line.split('\t') id1 = toks [0] description = toks [1] for line in bedfile: toks1 = line.split('\t') id2 = toks1 [3] if id1 == id2: d [toks1] = [id1, description]

Hide

Permalink

Ann Loraine added a comment - 17/Jan/19 4:18 PM

Suggestion:

Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file)
Open bed file, read line by line
For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary
Output original line plus two extra fields obtained from the mart file

Show

Ann Loraine added a comment - 17/Jan/19 4:18 PM Suggestion: Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file) Open bed file, read line by line For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary Output original line plus two extra fields obtained from the mart file

Hide

Permalink

Ann Loraine added a comment - 17/Jan/19 4:31 PM

Added tips on how to write the code - hopefully it helps!

Show

Ann Loraine added a comment - 17/Jan/19 4:31 PM Added tips on how to write the code - hopefully it helps!

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 14/Feb/19 1:55 PM

There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.

Show

Jill Jenkins (Inactive) added a comment - 14/Feb/19 1:55 PM There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 14/Feb/19 1:56 PM - edited

Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.

Show

Jill Jenkins (Inactive) added a comment - 14/Feb/19 1:56 PM - edited Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.

Hide

Permalink

Jill Jenkins (Inactive) added a comment - 15/Feb/19 4:40 PM

Resolved inconsistent behavior, tested in IGB, functions pass.

Show

Jill Jenkins (Inactive) added a comment - 15/Feb/19 4:40 PM Resolved inconsistent behavior, tested in IGB, functions pass.

People

Assignee:

Jill Jenkins (Inactive)

Reporter:

Ivory Blakley (Inactive)

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

10/Sep/18 11:52 AM

Updated:

21/Nov/19 2:36 PM

Resolved:

18/Mar/19 8:06 PM