Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2452

Investigate: Support searching genome version synonyms in Genome Dashboard

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Currently, the Genome Dashboard lets us search for species.

      It would be great if we could allow users to search using genome version names (and synonyms) as well.

      Investigate: Is this feasible or practical given the current Genome Dashboard design?

        Attachments

          Activity

          ann.loraine Ann Loraine created issue -
          ann.loraine Ann Loraine made changes -
          Field Original Value New Value
          Epic Link IGBF-1765 [ 17855 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 2: 22 Jun - 3 Jul [ 97 ] Summer 3: 3 Jul - 14 Jul [ 98 ]
          ann.loraine Ann Loraine made changes -
          Summary Support searching genome version synonyms in Genome Dashboard Investigate: Support searching genome version synonyms in Genome Dashboard
          ann.loraine Ann Loraine made changes -
          Description Currently, the Genome Dashboard lets us search for species.

          Let's allow users to also search using genome version names as well.

          Currently, the Genome Dashboard lets us search for species.

          It would be great if we could allow users to search using genome version names (and synonyms) as well.

          Investigate: Is this feasible or practical given the current Genome Dashboard design?
          ann.loraine Ann Loraine made changes -
          Story Points 1.5 0.5
          pbadzuh Philip Badzuh (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pbadzuh Philip Badzuh (Inactive) made changes -
          Assignee Philip Badzuh [ pbadzuh ]
          Hide
          pbadzuh Philip Badzuh (Inactive) added a comment - - edited

          Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search.

          Some reasoning behind this:

          • not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms.
          • there are also other discrepancies between the three files used in generating the backend data structure (species, synonyms, allquickloads)

          Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this.

          The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based.

          The backend data structure currently looks like so:
          https://pastebin.com/raw/JieyQpTx

          I think that in designing the database, it would be better to have a list of genome objects as outlined below:

          https://pastebin.com/raw/M7GY9BCF

          This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine].

          Show
          pbadzuh Philip Badzuh (Inactive) added a comment - - edited Currently, the search works using features from a genome card's html content. This means that, since genome versions are present in the card container, the current search implementation could be extended to include them. The current search, however, doesn't have access to the main backend data structure, where the synonym data is. One solution would be to pass this data into the front end by embedding it as attributes to the genome card, but this isn't very clean. I think that a better solution would be to create a search api endpoint for genome dashboard and pass to it the genome id and search terms as query parameters, which could be used in a backend search. Some reasoning behind this: not all genome versions e.g. A_gambiae_Oct_2006 have an associated entry in the synonyms. there are also other discrepancies between the three files used in generating the backend data structure ( species , synonyms , allquickloads ) Prior to implementing backend search, it would be good to transport all the data in the files mentioned above into database. I think that amazon dynamodb would work well for this. The main task would then be determining the structure for the database, which I propose below. Also, I think it would be good to use it as the single source of truth off of which both IGB and genome dashboard would be based. The backend data structure currently looks like so: https://pastebin.com/raw/JieyQpTx I think that in designing the database, it would be better to have a list of genome objects as outlined below: https://pastebin.com/raw/M7GY9BCF This groups data better and makes it easier to identify missing parts. Please let me know what you think, [~aloraine] .
          pbadzuh Philip Badzuh (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pbadzuh Philip Badzuh (Inactive) made changes -
          Assignee Philip Badzuh [ pbadzuh ]
          Hide
          ann.loraine Ann Loraine added a comment -

          Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful!

          A comment:

          • The improved data structure "id" should instead be called "genome_version_prefix".
          • The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release.

          Moving to Done.

          Show
          ann.loraine Ann Loraine added a comment - Thank you for the analysis and also for importing the data structure to pastbin. That was very helpful! A comment: The improved data structure "id" should instead be called "genome_version_prefix". The "id" for a genome version is actually the genome_version_prefix joined (by an underscore) the month and year of release. Moving to Done.
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          ann.loraine Ann Loraine made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          ann.loraine Ann Loraine made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          ann.loraine Ann Loraine made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          ann.loraine Ann Loraine made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
          ann.loraine Ann Loraine made changes -
          Assignee Philip Badzuh [ pbadzuh ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Closed [ 6 ] To-Do [ 10305 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 3: 6 Jul - 17 Jul [ 98 ] Summer 3: 6 Jul - 17 Jul, Summer 4: 14 Jul - 28 Jul [ 98, 99 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          Hide
          pbadzuh Philip Badzuh (Inactive) added a comment -

          To document and expand on my recent conversation with Dr. Loraine:

          • I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions.
          • For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.
          Show
          pbadzuh Philip Badzuh (Inactive) added a comment - To document and expand on my recent conversation with Dr. Loraine: I had it in mind to create a key-value db using amazon's dynamo, in order that both IGB and its genome dashboard could interact with it rather than the three files linked to above, as now. This could also allow for the creation of a data structure that would be easier to manage and implement. This, however, could potentially involve substantial changes to the IGB code base and would introduce an additional external dependency to IGB, although IGB already depends on internet connectivity for multiple functions. For the time being, it may be easier to simply implement backend search for genome dashboard, which would allow for full access to the current data structure and make possible the implementation of a search function that can search genome versions and synonyms. This could be implemented through an express api endpoint, which could be called from the front end.
          pbadzuh Philip Badzuh (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pbadzuh Philip Badzuh (Inactive) made changes -
          Assignee Philip Badzuh [ pbadzuh ]
          ann.loraine Ann Loraine made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          ann.loraine Ann Loraine made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          ann.loraine Ann Loraine made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          ann.loraine Ann Loraine made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          ann.loraine Ann Loraine made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          ann.loraine Ann Loraine made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          ann.loraine Ann Loraine made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          ann.loraine Ann Loraine made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
          ann.loraine Ann Loraine made changes -
          Assignee Philip Badzuh [ pbadzuh ]

            People

            • Assignee:
              pbadzuh Philip Badzuh (Inactive)
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: