Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2452

Investigate: Support searching genome version synonyms in Genome Dashboard

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Currently, the Genome Dashboard lets us search for species.

      It would be great if we could allow users to search using genome version names (and synonyms) as well.

      Investigate: Is this feasible or practical given the current Genome Dashboard design?

        Attachments

          Activity

          Hide
          pbadzuh Philip Badzuh (Inactive) added a comment - - edited

          Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search.

          Some reasoning behind this:

          • not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms.
          • there are also other discrepancies between the three files used in generating the backend data structure (species, synonyms, allquickloads)

          Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this.

          The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based.

          The backend data structure currently looks like so:
          https://pastebin.com/raw/JieyQpTx

          I think that in designing the database, it would be better to have a list of genome objects as outlined below:

          https://pastebin.com/raw/M7GY9BCF

          This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine].

          Show
          pbadzuh Philip Badzuh (Inactive) added a comment - - edited Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search. Some reasoning behind this: not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms. there are also other discrepancies between the three files used in generating the backend data structure ( species , synonyms , allquickloads ) Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this. The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based. The backend data structure currently looks like so: https://pastebin.com/raw/JieyQpTx I think that in designing the database, it would be better to have a list of genome objects as outlined below: https://pastebin.com/raw/M7GY9BCF This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine] .
          Hide
          ann.loraine Ann Loraine added a comment -

          Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful!

          A comment:

          • The improved data structure "id" should instead be called "genome_version_prefix".
          • The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release.

          Moving to Done.

          Show
          ann.loraine Ann Loraine added a comment - Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful! A comment: The improved data structure "id" should instead be called "genome_version_prefix". The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release. Moving to Done.
          Hide
          pbadzuh Philip Badzuh (Inactive) added a comment -

          To document and expand on my recent conversation with Dr. Loraine:

          • I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions.
          • For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.
          Show
          pbadzuh Philip Badzuh (Inactive) added a comment - To document and expand on my recent conversation with Dr. Loraine: I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions. For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.

            People

            • Assignee:
              pbadzuh Philip Badzuh (Inactive)
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: