Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3503

Implement UCSC REST API logic in IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 10.1.0
    • Labels:
      None

      Description

      Task:

      • Add a new module to IGB for ingesting UCSC REST API responses.
      • Implement all required APIs for communication to the new UCSC REST API and write testcases.
      • Change the existing IGB code to integrate with the new module APIs instead of DAS APIs

        Attachments

          Issue Links

            Activity

            nfreese Nowlan Freese created issue -
            nfreese Nowlan Freese made changes -
            Field Original Value New Value
            Epic Link IGBF-3129 [ 21675 ]
            jsirigin Jaya Sravani Sirigineedi made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            nfreese Nowlan Freese made changes -
            Story Points 2 10
            jsirigin Jaya Sravani Sirigineedi made changes -
            Description Task: Add new module to IGB for ingesting UCSC REST API responses Task:
            * Add a new module to IGB for ingesting UCSC REST API responses.
            * Implement all required APIs for communication to the new UCSC REST API.
            jsirigin Jaya Sravani Sirigineedi made changes -
            Description Task:
            * Add a new module to IGB for ingesting UCSC REST API responses.
            * Implement all required APIs for communication to the new UCSC REST API.
            Task:
            * Add a new module to IGB for ingesting UCSC REST API responses.
            * Implement all required APIs for communication to the new UCSC REST API and write testcases.
            * Change the existing IGB code to integrate with the new module APIs instead of DAS APIs
            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited

            Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Some notes from discussing the new API with Sravani:

            There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult.

            For example, this REST endpoint link provides the following response:

            {
              "downloadTime": "2023:12:07T20:45:10Z",
              "downloadTimeStamp": 1701981910,
              "genome": "galGal6",
              "dataTime": "2018-10-26T16:24:45",
              "dataTimeStamp": 1540596285,
              "trackType": "genePred refPep refMrna",
              "track": "refGene",
              "start": 43954913,
              "end": 43979746,
              "chrom": "chr1",
              "refGene": [
                {
                  "bin": 920,
                  "name": "NM_001004382",
                  "chrom": "chr1",
                  "strand": "-",
                  "txStart": 43956042,
                  "txEnd": 43978617,
                  "cdsStart": 43956726,
                  "cdsEnd": 43976566,
                  "exonCount": 7,
                  "exonStarts": "43956042,43957878,43959340,43961013,43964144,43976398,43978596,",
                  "exonEnds": "43956897,43957974,43959543,43961172,43964298,43976585,43978617,",
                  "score": 0,
                  "name2": "EPYC",
                  "cdsStartStat": "cmpl",
                  "cdsEndStat": "cmpl",
                  "exonFrames": "0,0,1,1,0,0,-1,"
                }
              ],
              "itemsReturned": 1
            }
            

            The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted.

            The mapping from the JSON above to what IGB probably expects the bed file to look like would be:
            JSON -> BED

            1. chrom -> chrom
            2. txStart -> chromStart
            3. txEnd -> chromEnd
            4. name -> name
            5. score -> score
            6. strand -> strand
            7. cdsStart -> thickStart
            8. cdsEnd -> thickEnd
            9. ???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0)
            10. exonCount -> blockCount
            11. exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds)
            12. exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts)
            13. name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely)

            Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response.

            Note that other data types will require different parsing. The example above is from refGene (see schema here), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl.

            Show
            nfreese Nowlan Freese added a comment - - edited Some notes from discussing the new API with Sravani: There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult. For example, this REST endpoint link provides the following response: { "downloadTime" : "2023:12:07T20:45:10Z" , "downloadTimeStamp" : 1701981910, "genome" : "galGal6" , "dataTime" : "2018-10-26T16:24:45" , "dataTimeStamp" : 1540596285, "trackType" : "genePred refPep refMrna" , "track" : "refGene" , "start" : 43954913, "end" : 43979746, "chrom" : "chr1" , "refGene" : [ { "bin" : 920, "name" : "NM_001004382" , "chrom" : "chr1" , "strand" : "-" , "txStart" : 43956042, "txEnd" : 43978617, "cdsStart" : 43956726, "cdsEnd" : 43976566, "exonCount" : 7, "exonStarts" : "43956042,43957878,43959340,43961013,43964144,43976398,43978596," , "exonEnds" : "43956897,43957974,43959543,43961172,43964298,43976585,43978617," , "score" : 0, "name2" : "EPYC" , "cdsStartStat" : "cmpl" , "cdsEndStat" : "cmpl" , "exonFrames" : "0,0,1,1,0,0,-1," } ], "itemsReturned" : 1 } The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted. The mapping from the JSON above to what IGB probably expects the bed file to look like would be: JSON -> BED chrom -> chrom txStart -> chromStart txEnd -> chromEnd name -> name score -> score strand -> strand cdsStart -> thickStart cdsEnd -> thickEnd ???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0) exonCount -> blockCount exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds) exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts) name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely) Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response . Note that other data types will require different parsing. The example above is from refGene ( see schema here ), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl .
            ann.loraine Ann Loraine made changes -
            Sprint Fall 7 [ 183 ] Fall 7, Fall 8 [ 183, 184 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Fall 7, Fall 8 [ 183, 184 ] Fall 7, Fall 8, Spring 1 [ 183, 184, 185 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Fall 7, Fall 8, Spring 1 [ 183, 184, 185 ] Fall 7, Fall 8, Spring 1, Spring 2 [ 183, 184, 185, 186 ]
            ann.loraine Ann Loraine made changes -
            Sprint Fall 7, Fall 8, Spring 1, Spring 2 [ 183, 184, 185, 186 ] Fall 7, Fall 8, Spring 1, Spring 2, Spring 3 [ 183, 184, 185, 186, 187 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-3603 [ IGBF-3603 ]
            nfreese Nowlan Freese made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Hide
            nfreese Nowlan Freese added a comment -

            As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.

            Show
            nfreese Nowlan Freese added a comment - As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            nfreese Nowlan Freese made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            nfreese Nowlan Freese made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            nfreese Nowlan Freese made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            nfreese Nowlan Freese made changes -
            Fix Version/s 10.1.0 [ 11000 ]

              People

              • Assignee:
                jsirigin Jaya Sravani Sirigineedi
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: