Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2903

Investigate why genome and assembly trackhub data sources are not loaded in IGB and fix the problem.

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Some public UCSC hubs can be added as data sources, however, they do not show up as such in the 'Available Data' pane when a genome they describe as supporting is opened in IGB. These include, but may not be limited to, hubs containing genome and assembly data. For example, see two problematic hubs below:

      Investigate why this is occurring by inspecting how the backend API handles these data sources and/or how IGB deals with them.

        Attachments

          Issue Links

            Activity

            pbadzuh Philip Badzuh (Inactive) created issue -
            pbadzuh Philip Badzuh (Inactive) made changes -
            Field Original Value New Value
            Epic Link IGBF-2831 [ 19524 ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Link This issue relates to IGBF-2898 [ IGBF-2898 ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Assignee Philip Badzuh [ pbadzuh ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Sprint Summer 3 2021 Jul 12 - Jul 23 [ 125 ] Summer 3 2021 Jul 12 - Jul 23, Summer 4 2021 Aug 2 - Aug 13 [ 125, 126 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            Some notes on my investigation:

            • The problematic hub urls can be added as data sources in IGB without any errors.
            • When opening a genome that should be supported by the problematic track hub, IGB does not display the newly added data sources in the 'Available Data' section. No network requests for information regarding these data sources are made by IGB.
            • I replicated IGB's requests and confirmed that contents.txt is successfully retrieved.
            • genomes.txt and annots.xml return empty and error responses, respectively.
            • Inspecting contents.txt and the trackhub API code shows that the returned genome names are NCBI accession numbers, which do not have corresponding entries in our synonyms.txt. IGB-specific genome name information is required to generate genomes.txt and annots.xml, which is why requests for these resources fail.
            • Using the UCSC API, organism genus and species information may be obtained for each genome listed in contents.txt. In cases where the UCSC genome names from context.txt have no match in synonyms.txt, we could perhaps use organism information to retrieve the latest available genome from synonyms.txt and default to that.
            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - Some notes on my investigation: The problematic hub urls can be added as data sources in IGB without any errors. When opening a genome that should be supported by the problematic track hub, IGB does not display the newly added data sources in the 'Available Data' section. No network requests for information regarding these data sources are made by IGB. I replicated IGB's requests and confirmed that contents.txt is successfully retrieved. genomes.txt and annots.xml return empty and error responses, respectively. Inspecting contents.txt and the trackhub API code shows that the returned genome names are NCBI accession numbers, which do not have corresponding entries in our synonyms.txt. IGB-specific genome name information is required to generate genomes.txt and annots.xml, which is why requests for these resources fail. Using the UCSC API, organism genus and species information may be obtained for each genome listed in contents.txt. In cases where the UCSC genome names from context.txt have no match in synonyms.txt, we could perhaps use organism information to retrieve the latest available genome from synonyms.txt and default to that.
            pbadzuh Philip Badzuh (Inactive) made changes -
            Summary Investigate why genome and assembly trackhub data sources are not loaded in IGB Investigate why genome and assembly trackhub data sources are not loaded in IGB and fix the problem.
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment - - edited

            Please see my changes for the below repos:

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - - edited Please see my changes for the below repos: hub facade
            pbadzuh Philip Badzuh (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Assignee Philip Badzuh [ pbadzuh ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            I'm not 100% sure about this, but I believe the "non-standard" regular expression was meant to handle genome version prefixes with a cultivar name, e.g.

            O_sativa_japonica
            

            within genome names:

            O_sativa_japonica_Oct_2011
            O_sativa_japonica_Jun_2009
            

            Another example is banana:

            M_acuminata_DH_Pahang_Jan_2016/
            

            I'm concerned that the following change:

            -    private static final Pattern NON_STANDARD_REGEX = Pattern.compile("^([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+).*$");
            +    private static final Pattern NON_STANDARD_REGEX = Pattern.compile("(.)+");
            

            will cause unintended side effects affecting how these example genome versions get displayed in the species and genome version menus.

            Request for Philip Badzuh: do you mind looking into this a bit more?

            Show
            ann.loraine Ann Loraine added a comment - I'm not 100% sure about this, but I believe the "non-standard" regular expression was meant to handle genome version prefixes with a cultivar name, e.g. O_sativa_japonica within genome names: O_sativa_japonica_Oct_2011 O_sativa_japonica_Jun_2009 Another example is banana: M_acuminata_DH_Pahang_Jan_2016/ I'm concerned that the following change: - private static final Pattern NON_STANDARD_REGEX = Pattern.compile( "^([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+).*$" ); + private static final Pattern NON_STANDARD_REGEX = Pattern.compile( "(.)+" ); will cause unintended side effects affecting how these example genome versions get displayed in the species and genome version menus. Request for Philip Badzuh : do you mind looking into this a bit more?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Deploying the new track hub facade code on the testing site using playbooks with inventory file:

            [hub_facade_hosts]
            TestHub1 ansible_host=54.225.244.14 domain=TestHub1.bioviz.org secret_key=lkdjkaejsdfkjfd repo=https://bitbucket.org/pbadzuh/track-hub-converter-webapp.git branch=IGBF-2903
            
            [hub_facade_hosts:vars]
            ansible_ssh_common_args="-o StrictHostKeyChecking=no"
            ansible_python_interpreter=/usr/bin/python3
            ansible_ssh_user=ubuntu
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Deploying the new track hub facade code on the testing site using playbooks with inventory file: [hub_facade_hosts] TestHub1 ansible_host=54.225.244.14 domain=TestHub1.bioviz.org secret_key=lkdjkaejsdfkjfd repo=https: //bitbucket.org/pbadzuh/track-hub-converter-webapp.git branch=IGBF-2903 [hub_facade_hosts:vars] ansible_ssh_common_args= "-o StrictHostKeyChecking=no" ansible_python_interpreter=/usr/bin/python3 ansible_ssh_user=ubuntu
            Hide
            ann.loraine Ann Loraine added a comment -

            I attempted to test the new track hub back end (branch IGBF-2803, see above comment) by entering the following URL into IGB as a quickload source, which is the example shown on "trackhubs.html" (see https://bioviztest.bioviz.org/trackhubs.html)

            testhub1.bioviz.org/rest_api/?hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt&fileName=/
            

            However, the contents.txt endpoint was broken and would not send the required data to IGB.

            The same error is seen when I deploy the latest "main" branch track hub facade back end.

            There appear to be several problems here, but I can't tell if they are related to the new branch code or not. The code itself looks fine, but I'm not able to deploy a working version.

            Moving back to "To-Do" in hopes Philip Badzuh can take a look tomorrow and figure out what could be going wrong.

            Show
            ann.loraine Ann Loraine added a comment - I attempted to test the new track hub back end (branch IGBF-2803 , see above comment) by entering the following URL into IGB as a quickload source, which is the example shown on "trackhubs.html" (see https://bioviztest.bioviz.org/trackhubs.html ) testhub1.bioviz.org/rest_api/?hubUrl=https: //genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt&fileName=/ However, the contents.txt endpoint was broken and would not send the required data to IGB. The same error is seen when I deploy the latest "main" branch track hub facade back end. There appear to be several problems here, but I can't tell if they are related to the new branch code or not. The code itself looks fine, but I'm not able to deploy a working version. Moving back to "To-Do" in hopes Philip Badzuh can take a look tomorrow and figure out what could be going wrong.
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Philip Badzuh [ pbadzuh ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            After some debugging, I realized that the change to IGB I proposed is no longer needed. I've deleted the associated branch. Kindly review only the hub facade changes.

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - After some debugging, I realized that the change to IGB I proposed is no longer needed. I've deleted the associated branch. Kindly review only the hub facade changes.
            pbadzuh Philip Badzuh (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Assignee Philip Badzuh [ pbadzuh ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Changes look great. Thank you for adding documentation strings. Please submit PR when ready.

            Show
            ann.loraine Ann Loraine added a comment - Changes look great. Thank you for adding documentation strings. Please submit PR when ready.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Assignee Philip Badzuh [ pbadzuh ]
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            Thanks. Please see the PR here.

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - Thanks. Please see the PR here .
            pbadzuh Philip Badzuh (Inactive) made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            pbadzuh Philip Badzuh (Inactive) made changes -
            Assignee Philip Badzuh [ pbadzuh ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged and re-deployed to test host. Branch was already tested. Moving to done.

            Show
            ann.loraine Ann Loraine added a comment - Merged and re-deployed to test host. Branch was already tested. Moving to done.
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Philip Badzuh [ pbadzuh ]

              People

              • Assignee:
                pbadzuh Philip Badzuh (Inactive)
                Reporter:
                pbadzuh Philip Badzuh (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: