Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3503

Implement UCSC REST API logic in IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 10.1.0
    • Labels:
      None

      Description

      Task:

      • Add a new module to IGB for ingesting UCSC REST API responses.
      • Implement all required APIs for communication to the new UCSC REST API and write testcases.
      • Change the existing IGB code to integrate with the new module APIs instead of DAS APIs

        Attachments

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited

            Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Some notes from discussing the new API with Sravani:

            There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult.

            For example, this REST endpoint link provides the following response:

            {
              "downloadTime": "2023:12:07T20:45:10Z",
              "downloadTimeStamp": 1701981910,
              "genome": "galGal6",
              "dataTime": "2018-10-26T16:24:45",
              "dataTimeStamp": 1540596285,
              "trackType": "genePred refPep refMrna",
              "track": "refGene",
              "start": 43954913,
              "end": 43979746,
              "chrom": "chr1",
              "refGene": [
                {
                  "bin": 920,
                  "name": "NM_001004382",
                  "chrom": "chr1",
                  "strand": "-",
                  "txStart": 43956042,
                  "txEnd": 43978617,
                  "cdsStart": 43956726,
                  "cdsEnd": 43976566,
                  "exonCount": 7,
                  "exonStarts": "43956042,43957878,43959340,43961013,43964144,43976398,43978596,",
                  "exonEnds": "43956897,43957974,43959543,43961172,43964298,43976585,43978617,",
                  "score": 0,
                  "name2": "EPYC",
                  "cdsStartStat": "cmpl",
                  "cdsEndStat": "cmpl",
                  "exonFrames": "0,0,1,1,0,0,-1,"
                }
              ],
              "itemsReturned": 1
            }
            

            The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted.

            The mapping from the JSON above to what IGB probably expects the bed file to look like would be:
            JSON -> BED

            1. chrom -> chrom
            2. txStart -> chromStart
            3. txEnd -> chromEnd
            4. name -> name
            5. score -> score
            6. strand -> strand
            7. cdsStart -> thickStart
            8. cdsEnd -> thickEnd
            9. ???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0)
            10. exonCount -> blockCount
            11. exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds)
            12. exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts)
            13. name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely)

            Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response.

            Note that other data types will require different parsing. The example above is from refGene (see schema here), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl.

            Show
            nfreese Nowlan Freese added a comment - - edited Some notes from discussing the new API with Sravani: There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult. For example, this REST endpoint link provides the following response: { "downloadTime" : "2023:12:07T20:45:10Z" , "downloadTimeStamp" : 1701981910, "genome" : "galGal6" , "dataTime" : "2018-10-26T16:24:45" , "dataTimeStamp" : 1540596285, "trackType" : "genePred refPep refMrna" , "track" : "refGene" , "start" : 43954913, "end" : 43979746, "chrom" : "chr1" , "refGene" : [ { "bin" : 920, "name" : "NM_001004382" , "chrom" : "chr1" , "strand" : "-" , "txStart" : 43956042, "txEnd" : 43978617, "cdsStart" : 43956726, "cdsEnd" : 43976566, "exonCount" : 7, "exonStarts" : "43956042,43957878,43959340,43961013,43964144,43976398,43978596," , "exonEnds" : "43956897,43957974,43959543,43961172,43964298,43976585,43978617," , "score" : 0, "name2" : "EPYC" , "cdsStartStat" : "cmpl" , "cdsEndStat" : "cmpl" , "exonFrames" : "0,0,1,1,0,0,-1," } ], "itemsReturned" : 1 } The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted. The mapping from the JSON above to what IGB probably expects the bed file to look like would be: JSON -> BED chrom -> chrom txStart -> chromStart txEnd -> chromEnd name -> name score -> score strand -> strand cdsStart -> thickStart cdsEnd -> thickEnd ???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0) exonCount -> blockCount exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds) exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts) name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely) Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response . Note that other data types will require different parsing. The example above is from refGene ( see schema here ), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl .
            Hide
            nfreese Nowlan Freese added a comment -

            As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.

            Show
            nfreese Nowlan Freese added a comment - As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.

              People

              • Assignee:
                jsirigin Jaya Sravani Sirigineedi
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: