Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1401

Add Chlorocebus sabaeus ("green monkey") to IGB quickload

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Story Points:
      1
    • Sprint:
      Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2, Spring 2019 Sprint 3

      Description

      A user would like us to add green monkey (Chlorocebus sabaeus) to IGB Quickload.

      We should get the sequence data (as usual) from UCSC but get the reference gene models from the GFF available from NCBI as he mentioned below. It would be more convenient to get the gene models from UCSC as usual, but I don't see them listed in the table browser. I think what may have happened is that NCBI has annotated the genome but UCSC has not yet imported the annotations into their database.

        Attachments

          Issue Links

            Activity

            ieclabau Ivory Blakley (Inactive) created issue -
            ieclabau Ivory Blakley (Inactive) made changes -
            Field Original Value New Value
            Rank Ranked higher
            ieclabau Ivory Blakley (Inactive) made changes -
            Sprint Fall 2018 Sprint 2 [ 52 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Rank Ranked lower
            ieclabau Ivory Blakley (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            genome sequence:

            Genome version name:

            • C_sabeus_Mar_2014

            synonyms:

            • Chlorocebus_sabeus 1.1
            • chlSab2
            • Vervet Genomics Consortium GCA_000409795.2
            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited genome sequence: url: http://hgdownload.cse.ucsc.edu/goldenPath/chlSab2/bigZips/ file: chlSab2.2bit Genome version name: C_sabeus_Mar_2014 synonyms: Chlorocebus_sabeus 1.1 chlSab2 Vervet Genomics Consortium GCA_000409795.2
            mason Mason Meyer (Inactive) made changes -
            Link This issue relates to HELP-306 [ HELP-306 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            The sequence file, and genome.txt, are set up and viewable here:
            http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/

            In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus.
            I'm not sure why....tracking that.
            -->resolved by adding species.txt file

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited The sequence file, and genome.txt, are set up and viewable here: http://18.222.191.240/Quickload_IGBF-1401_C.sabaeus/ In the species list, this genome is appearing at C_sabeus, rather than Chlorocebus sabaeus. I'm not sure why....tracking that. -->resolved by adding species.txt file
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            I'm having a hard time finding gene descriptions.

            What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited I'm having a hard time finding gene descriptions. What I currently have set up on the EC2 instance is just to test the sequence and quickload files. The annotations file is the all_mRNAs bed file from UCSC.
            ieclabau Ivory Blakley (Inactive) made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            ann.loraine Ann Loraine made changes -
            Story Points 1
            Sprint Fall 2018 Sprint 2 [ 52 ]
            Labels Intermediate
            Assignee Ivory Blakley [ ieclabau ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Fall 2018 Sprint 3 [ 53 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Fall 2018 Sprint 3 [ 53 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked lower
            ieclabau Ivory Blakley (Inactive) made changes -
            Sprint Fall 2018 Sprint 3 [ 53 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Rank Ranked higher
            ieclabau Ivory Blakley (Inactive) made changes -
            Sprint Fall 2018 Sprint 3 [ 53 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation!

            For this, we need to:

            • Deploy the latest green monkey assembly
            • Notify user (see related HELP issue for contact info) when it is deployed and ready.

            *Note*: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly.

            We also need to check that the documentation covers this aspect.

            Show
            ann.loraine Ann Loraine added a comment - - edited Previously, we did not have good documentation describing how to deploy a UCSC-managed genome assembly to Quickload. Now we have better documentation. Let's try this again – using the new documentation! For this, we need to: Deploy the latest green monkey assembly Notify user (see related HELP issue for contact info) when it is deployed and ready. * Note *: We also may need to add this genome assembly and species to IGB's internal species.txt and synonyms.txt files to ensure the species Latin and common names are displayed correctly. We also need to check that the documentation covers this aspect.
            ann.loraine Ann Loraine made changes -
            Status Open [ 1 ] Open [ 1 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Please see related issue HELP-306 for some history related to this issue.

            Show
            ann.loraine Ann Loraine added a comment - Please see related issue HELP-306 for some history related to this issue.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to HELP-306 [ HELP-306 ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 2018 Sprint 3 [ 58 ]
            Assignee Jill Jenkins [ jill ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Jill Jill Jenkins (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            Created directory in shared DropBox
            Added .2bit
            Created genome.txt
            D/L annotations from UCSC, track: Ensembl Genes
            Will research options to BED conversion

            Show
            Jill Jill Jenkins (Inactive) added a comment - Created directory in shared DropBox Added .2bit Created genome.txt D/L annotations from UCSC, track: Ensembl Genes Will research options to BED conversion
            Hide
            Jill Jill Jenkins (Inactive) added a comment - - edited

            Done:
            BED format file was added.

            To DO:
            Write script to merge BED and Ensembl table
            Find name to fill field 13, options: #1 gene symbol #2 ENS
            Attributes: Gene Stable ID, Gene Description, Gene Name

            Show
            Jill Jill Jenkins (Inactive) added a comment - - edited Done: BED format file was added. To DO: Write script to merge BED and Ensembl table Find name to fill field 13, options: #1 gene symbol #2 ENS Attributes: Gene Stable ID, Gene Description, Gene Name
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output.

            with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile:
            for line in martfile:
            toks = line.split('\t')
            id1 = toks[0]
            description = toks[1]
            for line in bedfile:
            toks1 = line.split('\t')
            id2 = toks1[3]
            if id1 == id2:
            d[toks1] = [id1, description]

            Show
            Jill Jill Jenkins (Inactive) added a comment - I have been stuck on trying to merge these files for days. Can you please review and see what I am doing wrong? I do not want to write a function, just a quick-and-dirty script to merge. I will paste what I have been playing around with. I am not getting any output. with open ('C_sabeus_mart.txt') as martfile, open ('AGM_subset.ensGene.bed') as bedfile: for line in martfile: toks = line.split('\t') id1 = toks [0] description = toks [1] for line in bedfile: toks1 = line.split('\t') id2 = toks1 [3] if id1 == id2: d [toks1] = [id1, description]
            Jill Jill Jenkins (Inactive) made changes -
            Assignee Jill Jenkins [ jill ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestion:

            • Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file)
            • Open bed file, read line by line
            • For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary
            • Output original line plus two extra fields obtained from the mart file
            Show
            ann.loraine Ann Loraine added a comment - Suggestion: Open and read mart file; put data into memory (dictionary where keys are transcript id that matches what's in the bed file) Open bed file, read line by line For each line in the bed file, use field 4 (transcript id) to look up same in mart dictionary Output original line plus two extra fields obtained from the mart file
            Hide
            ann.loraine Ann Loraine added a comment -

            Added tips on how to write the code - hopefully it helps!

            Show
            ann.loraine Ann Loraine added a comment - Added tips on how to write the code - hopefully it helps!
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Jill Jenkins [ jill ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Winter 2018 Sprint 3 [ 58 ] Winter 2018 Sprint 3, Spring 2019 Sprint 1 [ 58, 59 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-1522 [ IGBF-1522 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-1523 [ IGBF-1523 ]
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.

            Show
            Jill Jill Jenkins (Inactive) added a comment - There are corresponding ens gene IDs for each ens transcript stable ID; however, not all ens gene IDs are showing in the tool tip. When I check them against the ensGene.bed14 file, they are present. I have re-executed the script and outcome is the same.
            Hide
            Jill Jill Jenkins (Inactive) added a comment - - edited

            Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.

            Show
            Jill Jill Jenkins (Inactive) added a comment - - edited Inconsistent behavior due to field14 of BED detail not complete - issue on mapping from mart+ensemble merge. Working on resolution now.
            Jill Jill Jenkins (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 2018 Sprint 3, Spring 2019 Sprint 1 [ 58, 59 ] Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2 [ 58, 59, 60 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            Jill Jill Jenkins (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            Jill Jill Jenkins (Inactive) added a comment -

            Resolved inconsistent behavior, tested in IGB, functions pass.

            Show
            Jill Jill Jenkins (Inactive) added a comment - Resolved inconsistent behavior, tested in IGB, functions pass.
            Jill Jill Jenkins (Inactive) made changes -
            Assignee Jill Jenkins [ jill ]
            Jill Jill Jenkins (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2 [ 58, 59, 60 ] Winter 2018 Sprint 3, Spring 2019 Sprint 1, Spring 2019 Sprint 2, Spring 2019 Sprint 3 [ 58, 59, 60, 61 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] Reviewing [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Reviewing [ 10301 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 18105 ] Fall 2019 Workflow Update [ 19941 ]
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 19941 ] Revised Fall 2019 Workflow Update [ 22061 ]
            nfreese Nowlan Freese made changes -
            Assignee Ann Loraine [ aloraine ]
            nfreese Nowlan Freese made changes -
            Assignee Ann Loraine [ aloraine ] Jill Jenkins [ jill ]

              People

              • Assignee:
                Jill Jill Jenkins (Inactive)
                Reporter:
                ieclabau Ivory Blakley (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: