Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:5
-
Epic Link:
-
Sprint:Fall 6, Fall 7
Description
Situation: With the addition of UCSC REST (IGBF-3129), Ensembl (IGBF-3555), and NCBI (IGBF-3693) as data sources in IGB, the list of available genomes is very large and difficult to scroll through.
Task: Investigate and propose improvements to the genome selection UI. This could include adding search, a new tab for querying the various data providers, etc. Try to think of something new and innovative that is scalable and does not slow IGB down.
Turn this ticket into an epic once a proposed solution is found.
Investigated the Ensembl website to look for any similar search APIs, like NCBI provided but I haven’t found any such functionality. The closest approaches I found are:
1. User directly searches for an Ensembl genome name with its scientific name like arabidopsis_thaliana and we search for it in https://rest.ensembl.org/documentation/info/info_genome API and see whether that genome exists or not. If it is present we can go ahead and call the rest of the APIs and load the genome to IGB.
2. There is an API that allows us to check the genomes by using the Ensembl state id as well, the process is similar to first one but user searches with the Ensembl state id, and the API to check is https://rest.ensembl.org/documentation/info/lookup
3. There is another approach where the user enters a non-scientific (common) name of a genome and we use this https://rest.ensembl.org/taxonomy/name/mouse?content-type=application/json API to get the taxonomy names of them using https://rest.ensembl.org/info/genomes/taxonomy/Mus%20musculus?content-type=application/json and from there we can get the scientific name which can be used to get the rest of the data. But the common name should match with the Ensembl common name, so this doesn’t add any value to the user but adds difficulty to the developer.
Here is the page source link for the main page of Ensembl where they get the genomes displayed in the dropdown: view-source:https://useast.ensembl.org/index.html. They are just getting the data from the HTML page. There is some button in Ensembl that search for genes but I don't see much use from it either.