Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3975

Investigate an efficient way to integrate Ensembl into IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: With the addition of UCSC REST (IGBF-3129), Ensembl (IGBF-3555), and NCBI (IGBF-3693) as data sources in IGB, the list of available genomes is very large and difficult to scroll through.

      Task: Investigate and propose improvements to the genome selection UI. This could include adding search, a new tab for querying the various data providers, etc. Try to think of something new and innovative that is scalable and does not slow IGB down.

      Turn this ticket into an epic once a proposed solution is found.

        Attachments

          Activity

          nfreese Nowlan Freese created issue -
          nfreese Nowlan Freese made changes -
          Field Original Value New Value
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          nfreese Nowlan Freese made changes -
          Epic Link IGBF-1765 [ 17855 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 6 [ 207 ]
          nfreese Nowlan Freese made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Sprint Fall 6 [ 207 ] Fall 6, Fall 7 [ 207, 208 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          Hide
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment -

          Investigated the Ensembl website to look for any similar search APIs, like NCBI provided but I haven’t found any such functionality. The closest approaches I found are:

          1. User directly searches for an Ensembl genome name with its scientific name like arabidopsis_thaliana and we search for it in https://rest.ensembl.org/documentation/info/info_genome API and see whether that genome exists or not. If it is present we can go ahead and call the rest of the APIs and load the genome to IGB.
          2. There is an API that allows us to check the genomes by using the Ensembl state id as well, the process is similar to first one but user searches with the Ensembl state id, and the API to check is https://rest.ensembl.org/documentation/info/lookup
          3. There is another approach where the user enters a non-scientific (common) name of a genome and we use this https://rest.ensembl.org/taxonomy/name/mouse?content-type=application/json API to get the taxonomy names of them using https://rest.ensembl.org/info/genomes/taxonomy/Mus%20musculus?content-type=application/json and from there we can get the scientific name which can be used to get the rest of the data. But the common name should match with the Ensembl common name, so this doesn’t add any value to the user but adds difficulty to the developer.

          Here is the page source link for the main page of Ensembl where they get the genomes displayed in the dropdown: view-source:https://useast.ensembl.org/index.html. They are just getting the data from the HTML page. There is some button in Ensembl that search for genes but I don't see much use from it either.

          Show
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment - Investigated the Ensembl website to look for any similar search APIs, like NCBI provided but I haven’t found any such functionality. The closest approaches I found are: 1. User directly searches for an Ensembl genome name with its scientific name like arabidopsis_thaliana and we search for it in https://rest.ensembl.org/documentation/info/info_genome API and see whether that genome exists or not. If it is present we can go ahead and call the rest of the APIs and load the genome to IGB. 2. There is an API that allows us to check the genomes by using the Ensembl state id as well, the process is similar to first one but user searches with the Ensembl state id, and the API to check is https://rest.ensembl.org/documentation/info/lookup 3. There is another approach where the user enters a non-scientific (common) name of a genome and we use this https://rest.ensembl.org/taxonomy/name/mouse?content-type=application/json API to get the taxonomy names of them using https://rest.ensembl.org/info/genomes/taxonomy/Mus%20musculus?content-type=application/json and from there we can get the scientific name which can be used to get the rest of the data. But the common name should match with the Ensembl common name, so this doesn’t add any value to the user but adds difficulty to the developer. Here is the page source link for the main page of Ensembl where they get the genomes displayed in the dropdown: view-source: https://useast.ensembl.org/index.html . They are just getting the data from the HTML page. There is some button in Ensembl that search for genes but I don't see much use from it either.
          Hide
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment -

          After the discussion with Nowlan Freese, we decided to look for any workshops or set up any workshop with Enseml to check whether they have any search API and also check for rate limiting of their APIs. Investigated these things and found an existing course (https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/) and a workshop available on Youtube (https://www.youtube.com/watch?v=S7v3lLQCFsk). I have gone through the course and am looking into the available workshop to see if I can find any useful info, as of now these two resources deal with the basic stuff only but I found out about the rate limiting: "allowed 55000 requests over an hour (3600 seconds): an average 15 requests per second" this shouldn't be a problem for us.
          Next steps:

          • Finish going through the resources, consolidate all the questions, and send out an email to Ensembl support or the dev team depending on the questions.
          • Think about different UI designs and investigate which would be more user-friendly.
          Show
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment - After the discussion with Nowlan Freese , we decided to look for any workshops or set up any workshop with Enseml to check whether they have any search API and also check for rate limiting of their APIs. Investigated these things and found an existing course ( https://www.ebi.ac.uk/training/online/courses/ensembl-rest-api/ ) and a workshop available on Youtube ( https://www.youtube.com/watch?v=S7v3lLQCFsk ). I have gone through the course and am looking into the available workshop to see if I can find any useful info, as of now these two resources deal with the basic stuff only but I found out about the rate limiting: "allowed 55000 requests over an hour (3600 seconds): an average 15 requests per second" this shouldn't be a problem for us. Next steps: Finish going through the resources, consolidate all the questions, and send out an email to Ensembl support or the dev team depending on the questions. Think about different UI designs and investigate which would be more user-friendly.
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Summary Investigate improving IGB Genome selection UI Investigate a new way to integrate Ensembl in an efficient way
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Summary Investigate a new way to integrate Ensembl in an efficient way Investigate how to integrate Ensembl in an efficient way
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Summary Investigate how to integrate Ensembl in an efficient way Investigate an efficient way to integrate Ensembl into IGB
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Story Points 2 5
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Epic Link IGBF-1765 [ 17855 ] IGBF-3555 [ 22774 ]
          Hide
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment - - edited

          Will be implementing this as a plugin IGB app and one of the options to implement the search is to get the data from the API and store the data in Redis to do caching, another one is to just store the data in a map and do the search, this approach wouldn't be efficient considering the no.of genomes we get from Ensembl. Another approach is to completely remove the dynamic search and allow user to enter the scientific name of the genome to search. Next step is to start working on creating a new plugin app for Ensembl, do a basic UI, and integrate it into IGB, here is the ticket for this (https://jira.bioviz.org/browse/IGBF-4032). Closing this ticket.

          Show
          jsirigin Jaya Sravani Sirigineedi (Inactive) added a comment - - edited Will be implementing this as a plugin IGB app and one of the options to implement the search is to get the data from the API and store the data in Redis to do caching, another one is to just store the data in a map and do the search, this approach wouldn't be efficient considering the no.of genomes we get from Ensembl. Another approach is to completely remove the dynamic search and allow user to enter the scientific name of the genome to search. Next step is to start working on creating a new plugin app for Ensembl, do a basic UI, and integrate it into IGB, here is the ticket for this ( https://jira.bioviz.org/browse/IGBF-4032 ). Closing this ticket.
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          jsirigin Jaya Sravani Sirigineedi (Inactive) made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

            People

            • Assignee:
              jsirigin Jaya Sravani Sirigineedi (Inactive)
              Reporter:
              nfreese Nowlan Freese
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: