[IGBF-2452] Investigate: Support searching genome version synonyms in Genome Dashboard - JIRA UNCC

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
0.5
Epic Link:
Improve IGB for users
Sprint:
Summer 3: 6 Jul - 17 Jul, Summer 4: 14 Jul - 28 Jul

Description

Currently, the Genome Dashboard lets us search for species.

It would be great if we could allow users to search using genome version names (and synonyms) as well.

Investigate: Is this feasible or practical given the current Genome Dashboard design?

Attachments

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Philip Badzuh (Inactive) added a comment - 07/Jul/20 12:10 PM - edited

Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search.

Some reasoning behind this:

not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms.
there are also other discrepancies between the three files used in generating the backend data structure (species, synonyms, allquickloads)

Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this.

The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based.

The backend data structure currently looks like so:
https://pastebin.com/raw/JieyQpTx

I think that in designing the database, it would be better to have a list of genome objects as outlined below:

https://pastebin.com/raw/M7GY9BCF

This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine].

Show

Philip Badzuh (Inactive) added a comment - 07/Jul/20 12:10 PM - edited Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search. Some reasoning behind this: not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms. there are also other discrepancies between the three files used in generating the backend data structure ( species , synonyms , allquickloads ) Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this. The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based. The backend data structure currently looks like so: https://pastebin.com/raw/JieyQpTx I think that in designing the database, it would be better to have a list of genome objects as outlined below: https://pastebin.com/raw/M7GY9BCF This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine] .

Hide

Permalink

Ann Loraine added a comment - 12/Jul/20 2:41 PM

Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful!

A comment:

The improved data structure "id" should instead be called "genome_version_prefix".
The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release.

Moving to Done.

Show

Ann Loraine added a comment - 12/Jul/20 2:41 PM Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful! A comment: The improved data structure "id" should instead be called "genome_version_prefix". The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release. Moving to Done.

Hide

Permalink

Philip Badzuh (Inactive) added a comment - 20/Jul/20 11:48 AM

To document and expand on my recent conversation with Dr. Loraine:

I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions.
For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.

Show

Philip Badzuh (Inactive) added a comment - 20/Jul/20 11:48 AM To document and expand on my recent conversation with Dr. Loraine: I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions. For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.

Investigate: Support searching genome version synonyms in Genome Dashboard

Details

Description

Attachments

Activity

People

Dates