Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3975

Investigate an efficient way to integrate Ensembl into IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: With the addition of UCSC REST (IGBF-3129), Ensembl (IGBF-3555), and NCBI (IGBF-3693) as data sources in IGB, the list of available genomes is very large and difficult to scroll through.

      Task: Investigate and propose improvements to the genome selection UI. This could include adding search, a new tab for querying the various data providers, etc. Try to think of something new and innovative that is scalable and does not slow IGB down.

      Turn this ticket into an epic once a proposed solution is found.

        Attachments

          Activity

          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          Investigated the Ensembl website to look for any similar search APIs, like NCBI provided but I haven’t found any such functionality. The closest approaches I found are:

          1. User directly searches for an Ensembl genome name with its scientific name like arabidopsis_thaliana and we search for it in https://rest.ensembl.org/documentation/info/info_genome API and see whether that genome exists or not. If it is present we can go ahead and call the rest of the APIs and load the genome to IGB.
          2. There is an API that allows us to check the genomes by using the Ensembl state id as well, the process is similar to first one but user searches with the Ensembl state id, and the API to check is https://rest.ensembl.org/documentation/info/lookup
          3. There is another approach where the user enters a non-scientific (common) name of a genome and we use this https://rest.ensembl.org/taxonomy/name/mouse?content-type=application/json API to get the taxonomy names of them using https://rest.ensembl.org/info/genomes/taxonomy/Mus%20musculus?content-type=application/json and from there we can get the scientific name which can be used to get the rest of the data. But the common name should match with the Ensembl common name, so this doesn’t add any value to the user but adds difficulty to the developer.

          Here is the page source link for the main page of Ensembl where they get the genomes displayed in the dropdown: view-source:https://useast.ensembl.org/index.html. They are just getting the data from the HTML page. There is some button in Ensembl that search for genes but I don't see much use from it either.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - Investigated the Ensembl website to look for any similar search APIs, like NCBI provided but I haven’t found any such functionality. The closest approaches I found are: 1. User directly searches for an Ensembl genome name with its scientific name like arabidopsis_thaliana and we search for it in https://rest.ensembl.org/documentation/info/info_genome API and see whether that genome exists or not. If it is present we can go ahead and call the rest of the APIs and load the genome to IGB. 2. There is an API that allows us to check the genomes by using the Ensembl state id as well, the process is similar to first one but user searches with the Ensembl state id, and the API to check is https://rest.ensembl.org/documentation/info/lookup 3. There is another approach where the user enters a non-scientific (common) name of a genome and we use this https://rest.ensembl.org/taxonomy/name/mouse?content-type=application/json API to get the taxonomy names of them using https://rest.ensembl.org/info/genomes/taxonomy/Mus%20musculus?content-type=application/json and from there we can get the scientific name which can be used to get the rest of the data. But the common name should match with the Ensembl common name, so this doesn’t add any value to the user but adds difficulty to the developer. Here is the page source link for the main page of Ensembl where they get the genomes displayed in the dropdown: view-source: https://useast.ensembl.org/index.html . They are just getting the data from the HTML page. There is some button in Ensembl that search for genes but I don't see much use from it either.
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          After the discussion with Nowlan Freese, we decided to look for any workshops or set up any workshop with Enseml to check whether they have any search API and also check for rate limiting of their APIs. Investigated these things and found an existing course (https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/) and a workshop available on Youtube (https://www.youtube.com/watch?v=S7v3lLQCFsk). I have gone through the course and am looking into the available workshop to see if I can find any useful info, as of now these two resources deal with the basic stuff only but I found out about the rate limiting: "allowed 55000 requests over an hour (3600 seconds): an average 15 requests per second" this shouldn't be a problem for us.
          Next steps:

          • Finish going through the resources, consolidate all the questions, and send out an email to Ensembl support or the dev team depending on the questions.
          • Think about different UI designs and investigate which would be more user-friendly.
          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - After the discussion with Nowlan Freese , we decided to look for any workshops or set up any workshop with Enseml to check whether they have any search API and also check for rate limiting of their APIs. Investigated these things and found an existing course ( https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/ ) and a workshop available on Youtube ( https://www.youtube.com/watch?v=S7v3lLQCFsk ). I have gone through the course and am looking into the available workshop to see if I can find any useful info, as of now these two resources deal with the basic stuff only but I found out about the rate limiting: "allowed 55000 requests over an hour (3600 seconds): an average 15 requests per second" this shouldn't be a problem for us. Next steps: Finish going through the resources, consolidate all the questions, and send out an email to Ensembl support or the dev team depending on the questions. Think about different UI designs and investigate which would be more user-friendly.
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited

          Will be implementing this as a plugin IGB app and one of the options to implement the search is to get the data from the API and store the data in Redis to do caching, another one is to just store the data in a map and do the search, this approach wouldn't be efficient considering the no.of genomes we get from Ensembl. Another approach is to completely remove the dynamic search and allow user to enter the scientific name of the genome to search. Next step is to start working on creating a new plugin app for Ensembl, do a basic UI, and integrate it into IGB, here is the ticket for this (https://jira.bioviz.org/browse/IGBF-4032). Closing this ticket.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited Will be implementing this as a plugin IGB app and one of the options to implement the search is to get the data from the API and store the data in Redis to do caching, another one is to just store the data in a map and do the search, this approach wouldn't be efficient considering the no.of genomes we get from Ensembl. Another approach is to completely remove the dynamic search and allow user to enter the scientific name of the genome to search. Next step is to start working on creating a new plugin app for Ensembl, do a basic UI, and integrate it into IGB, here is the ticket for this ( https://jira.bioviz.org/browse/IGBF-4032 ). Closing this ticket.

            People

            • Assignee:
              jsirigin Jaya Sravani Sirigineedi
              Reporter:
              nfreese Nowlan Freese
            • Votes:
              0 Vote for this issue
              Watchers:
              Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: