Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1401

Add Chlorocebus sabaeus ("green monkey") to IGB quickload

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Story Points:
      1
    • Sprint:
      Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2, Spring 2019 Sprint 3

      Description

      A user would like us to add green monkey (Chlorocebus sabaeus) to IGB Quickload.

      We should get the sequence data (as usual) from UCSC but get the reference gene models from the GFF available from NCBI as he mentioned below. It would be more convenient to get the gene models from UCSC as usual, but I don't see them listed in the table browser. I think what may have happened is that NCBI has annotated the genome but UCSC has not yet imported the annotations into their database.

        Attachments

          Issue Links

            Activity

            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            genome sequence:

            Genome version name:

            • C_sabeus_Mar_2014

            synonyms:

            • Chlorocebus_sabeus 1.1
            • chlSab2
            • Vervet Genomics Consortium GCA_000409795.2
            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited genome sequence: url: http://hgdownload.cse.ucsc.edu/goldenPath/chlSab2/bigZips/ file: chlSab2.2bit Genome version name: C_sabeus_Mar_2014 synonyms: Chlorocebus_sabeus 1.1 chlSab2 Vervet Genomics Consortium GCA_000409795.2
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            The sequence file, and genome.txt, are set up and viewable here:
            http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/

            In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus.
            I'm not sure why....tracking that.
            -->resolved by adding species.txt file

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited The sequence file, and genome.txt, are set up and viewable here: http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/ In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus. I'm not sure why....tracking that. -->resolved by adding species.txt file
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            I'm having a hard time finding gene descriptions.

            What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited I'm having a hard time finding gene descriptions. What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation!

            For this, we need to:

            • Deploy the latest green monkey assembly
            • Notify user (see related HELP issue for contact info) when it is deployed and ready.

            *Note*: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly.

            We also need to check that the documentation covers this aspect.

            Show
            ann.loraine Ann Loraine added a comment - - edited Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation! For this, we need to: Deploy the latest green monkey assembly Notify user (see related HELP issue for contact info) when it is deployed and ready. * Note *: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly. We also need to check that the documentation covers this aspect.
            Hide
            ann.loraine Ann Loraine added a comment -

            Please see related issue HELP-306 for some history related to this issue.

            Show
            ann.loraine Ann Loraine added a comment - Please see related issue HELP-306 for some history related to this issue.
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            Created directory in shared DropBox
            Added .2bit
            Created genome.txt
            D/L annotations from UCSC, track: Ensembl Genes
            Will research options to BED conversion

            Show
            Jill Jill Jenkins (Inactive) added a comment - Created directory in shared DropBox Added .2bit Created genome.txt D/L annotations from UCSC, track: Ensembl Genes Will research options to BED conversion
            Hide
            Jill Jill Jenkins (Inactive) added a comment - - edited

            Done:
            BED format file was added.

            To DO:
            Write script to merge BED and Ensembl table
            Find name to fill field 13, options: #1 gene symbol #2 ENS
            Attributes: Gene Stable ID, Gene Description, Gene Name

            Show
            Jill Jill Jenkins (Inactive) added a comment - - edited Done: BED format file was added. To DO: Write script to merge BED and Ensembl table Find name to fill field 13, options: #1 gene symbol #2 ENS Attributes: Gene Stable ID, Gene Description, Gene Name
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output.

            with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile:
            for line in martfile:
            toks = line.split('\t')
            id1 = toks[0]
            description = toks[1]
            for line in bedfile:
            toks1 = line.split('\t')
            id2 = toks1[3]
            if id1 == id2:
            d[toks1] = [id1, description]

            Show
            Jill Jill Jenkins (Inactive) added a comment - I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output. with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile: for line in martfile: toks = line.split('\t') id1 = toks [0] description = toks [1] for line in bedfile: toks1 = line.split('\t') id2 = toks1 [3] if id1 == id2: d [toks1] = [id1, description]
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestion:

            • Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file)
            • Open bed file, read line by line
            • For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary
            • Output original line plus two extra fields obtained from the mart file
            Show
            ann.loraine Ann Loraine added a comment - Suggestion: Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file) Open bed file, read line by line For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary Output original line plus two extra fields obtained from the mart file
            Hide
            ann.loraine Ann Loraine added a comment -

            Added tips on how to write the code - hopefully it helps!

            Show
            ann.loraine Ann Loraine added a comment - Added tips on how to write the code - hopefully it helps!
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.

            Show
            Jill Jill Jenkins (Inactive) added a comment - There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.
            Hide
            Jill Jill Jenkins (Inactive) added a comment - - edited

            Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.

            Show
            Jill Jill Jenkins (Inactive) added a comment - - edited Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            Resolved inconsistent behavior, tested in IGB, functions pass.

            Show
            Jill Jill Jenkins (Inactive) added a comment - Resolved inconsistent behavior, tested in IGB, functions pass.

              People

              • Assignee:
                Jill Jill Jenkins (Inactive)
                Reporter:
                ieclabau Ivory Blakley (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: