[IGBF-3503] Implement UCSC REST API logic in IGB - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: 10.1.0
Labels:
None

Story Points:
10
Epic Link:
Add UCSC REST API to IGB
Sprint:
Fall 7, Fall 8, Spring 1, Spring 2, Spring 3

Description

Task:

Add a new module to IGB for ingesting UCSC REST API responses.
Implement all required APIs for communication to the new UCSC REST API and write testcases.
Change the existing IGB code to integrate with the new module APIs instead of DAS APIs

Attachments

Issue Links

relates to

IGBF-3603 Implement logic to load data for Wiggle file type

Closed

Options

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 07/Dec/23 11:05 AM - edited

Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505

Show

Jaya Sravani Sirigineedi added a comment - 07/Dec/23 11:05 AM - edited Created a new module and implemented the logic for integrating the https://api.genome.ucsc.edu/list/ucscGenomes API to get the available genomes. Related subtask: https://jira.bioviz.org/browse/IGBF-3505

Hide

Permalink

Nowlan Freese added a comment - 08/Dec/23 9:30 AM - edited

Some notes from discussing the new API with Sravani:

There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult.

For example, this REST endpoint link provides the following response:

{
  "downloadTime": "2023:12:07T20:45:10Z",
  "downloadTimeStamp": 1701981910,
  "genome": "galGal6",
  "dataTime": "2018-10-26T16:24:45",
  "dataTimeStamp": 1540596285,
  "trackType": "genePred refPep refMrna",
  "track": "refGene",
  "start": 43954913,
  "end": 43979746,
  "chrom": "chr1",
  "refGene": [
    {
      "bin": 920,
      "name": "NM_001004382",
      "chrom": "chr1",
      "strand": "-",
      "txStart": 43956042,
      "txEnd": 43978617,
      "cdsStart": 43956726,
      "cdsEnd": 43976566,
      "exonCount": 7,
      "exonStarts": "43956042,43957878,43959340,43961013,43964144,43976398,43978596,",
      "exonEnds": "43956897,43957974,43959543,43961172,43964298,43976585,43978617,",
      "score": 0,
      "name2": "EPYC",
      "cdsStartStat": "cmpl",
      "cdsEndStat": "cmpl",
      "exonFrames": "0,0,1,1,0,0,-1,"
    }
  ],
  "itemsReturned": 1
}

The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted.

The mapping from the JSON above to what IGB probably expects the bed file to look like would be:
JSON -> BED

chrom -> chrom
txStart -> chromStart
txEnd -> chromEnd
name -> name
score -> score
strand -> strand
cdsStart -> thickStart
cdsEnd -> thickEnd
???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0)
exonCount -> blockCount
exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds)
exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts)
name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely)

Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response.

Note that other data types will require different parsing. The example above is from refGene (see schema here), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl.

Show

Nowlan Freese added a comment - 08/Dec/23 9:30 AM - edited Some notes from discussing the new API with Sravani: There are differences in the data returned between the old DAS endpoints and the new REST endpoints. The new REST endpoints return more feature rich data that we can use in IGB (this is great news!). However, this will require some additional parsing logic to be added to the new module, though it shouldn't be too difficult. For example, this REST endpoint link provides the following response: { "downloadTime" : "2023:12:07T20:45:10Z" , "downloadTimeStamp" : 1701981910, "genome" : "galGal6" , "dataTime" : "2018-10-26T16:24:45" , "dataTimeStamp" : 1540596285, "trackType" : "genePred refPep refMrna" , "track" : "refGene" , "start" : 43954913, "end" : 43979746, "chrom" : "chr1" , "refGene" : [ { "bin" : 920, "name" : "NM_001004382" , "chrom" : "chr1" , "strand" : "-" , "txStart" : 43956042, "txEnd" : 43978617, "cdsStart" : 43956726, "cdsEnd" : 43976566, "exonCount" : 7, "exonStarts" : "43956042,43957878,43959340,43961013,43964144,43976398,43978596," , "exonEnds" : "43956897,43957974,43959543,43961172,43964298,43976585,43978617," , "score" : 0, "name2" : "EPYC" , "cdsStartStat" : "cmpl" , "cdsEndStat" : "cmpl" , "exonFrames" : "0,0,1,1,0,0,-1," } ], "itemsReturned" : 1 } The above JSON maps very closely to the data we need to construct a bed file. See the UCSC page on file formats for additional information on how bed files are formatted. The mapping from the JSON above to what IGB probably expects the bed file to look like would be: JSON -> BED chrom -> chrom txStart -> chromStart txEnd -> chromEnd name -> name score -> score strand -> strand cdsStart -> thickStart cdsEnd -> thickEnd ???? -> itemRgb (bed requires a value for RGB, we generally default to providing black -> 0,0,0) exonCount -> blockCount exonEnds - exonStarts -> blockSizes (to get the blockSizes value, need to subtract exonStarts from exonEnds) exonStarts - txStart -> blockStarts (to get the blockStarts value, need to subtract the txStart from exonStarts) name2 -> ID (this is the 13th column, parsed as bed detail by IGB most likely) Note that we are now receiving the transcription start (txStart) and end (txEnd), which will allow us to draw the UTR regions, as well as "name2" which we could potentially use in column 13 of a bed detail file. This can be compared to what we were receiving from DAS which did not include this information - DAS response . Note that other data types will require different parsing. The example above is from refGene ( see schema here ), which is effectively bed. Other data types are provided by the REST endpoint, and may map to different data types such as psl .

Hide

Permalink

Nowlan Freese added a comment - 16/Feb/24 2:32 PM

As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.

Show

Nowlan Freese added a comment - 16/Feb/24 2:32 PM As the majority of the logic has been implemented and tested, I am closing this ticket. There are some additional file types to add, but these can be completed as individual tickets.

Implement UCSC REST API logic in IGB

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates