Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2903

Investigate why genome and assembly trackhub data sources are not loaded in IGB and fix the problem.

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Some public UCSC hubs can be added as data sources, however, they do not show up as such in the 'Available Data' pane when a genome they describe as supporting is opened in IGB. These include, but may not be limited to, hubs containing genome and assembly data. For example, see two problematic hubs below:

      Investigate why this is occurring by inspecting how the backend API handles these data sources and/or how IGB deals with them.

        Attachments

          Issue Links

            Activity

            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            Some notes on my investigation:

            • The problematic hub urls can be added as data sources in IGB without any errors.
            • When opening a genome that should be supported by the problematic track hub, IGB does not display the newly added data sources in the 'Available Data' section. No network requests for information regarding these data sources are made by IGB.
            • I replicated IGB's requests and confirmed that contents.txt is successfully retrieved.
            • genomes.txt and annots.xml return empty and error responses, respectively.
            • Inspecting contents.txt and the trackhub API code shows that the returned genome names are NCBI accession numbers, which do not have corresponding entries in our synonyms.txt. IGB-specific genome name information is required to generate genomes.txt and annots.xml, which is why requests for these resources fail.
            • Using the UCSC API, organism genus and species information may be obtained for each genome listed in contents.txt. In cases where the UCSC genome names from context.txt have no match in synonyms.txt, we could perhaps use organism information to retrieve the latest available genome from synonyms.txt and default to that.
            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - Some notes on my investigation: The problematic hub urls can be added as data sources in IGB without any errors. When opening a genome that should be supported by the problematic track hub, IGB does not display the newly added data sources in the 'Available Data' section. No network requests for information regarding these data sources are made by IGB. I replicated IGB's requests and confirmed that contents.txt is successfully retrieved. genomes.txt and annots.xml return empty and error responses, respectively. Inspecting contents.txt and the trackhub API code shows that the returned genome names are NCBI accession numbers, which do not have corresponding entries in our synonyms.txt. IGB-specific genome name information is required to generate genomes.txt and annots.xml, which is why requests for these resources fail. Using the UCSC API, organism genus and species information may be obtained for each genome listed in contents.txt. In cases where the UCSC genome names from context.txt have no match in synonyms.txt, we could perhaps use organism information to retrieve the latest available genome from synonyms.txt and default to that.
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment - - edited

            Please see my changes for the below repos:

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - - edited Please see my changes for the below repos: hub facade
            Hide
            ann.loraine Ann Loraine added a comment -

            I'm not 100% sure about this, but I believe the "non-standard" regular expression was meant to handle genome version prefixes with a cultivar name, e.g.

            O_sativa_japonica
            

            within genome names:

            O_sativa_japonica_Oct_2011
            O_sativa_japonica_Jun_2009
            

            Another example is banana:

            M_acuminata_DH_Pahang_Jan_2016/
            

            I'm concerned that the following change:

            -    private static final Pattern NON_STANDARD_REGEX = Pattern.compile("^([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+).*$");
            +    private static final Pattern NON_STANDARD_REGEX = Pattern.compile("(.)+");
            

            will cause unintended side effects affecting how these example genome versions get displayed in the species and genome version menus.

            Request for Philip Badzuh: do you mind looking into this a bit more?

            Show
            ann.loraine Ann Loraine added a comment - I'm not 100% sure about this, but I believe the "non-standard" regular expression was meant to handle genome version prefixes with a cultivar name, e.g. O_sativa_japonica within genome names: O_sativa_japonica_Oct_2011 O_sativa_japonica_Jun_2009 Another example is banana: M_acuminata_DH_Pahang_Jan_2016/ I'm concerned that the following change: - private static final Pattern NON_STANDARD_REGEX = Pattern.compile( "^([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+).*$" ); + private static final Pattern NON_STANDARD_REGEX = Pattern.compile( "(.)+" ); will cause unintended side effects affecting how these example genome versions get displayed in the species and genome version menus. Request for Philip Badzuh : do you mind looking into this a bit more?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Deploying the new track hub facade code on the testing site using playbooks with inventory file:

            [hub_facade_hosts]
            TestHub1 ansible_host=54.225.244.14 domain=TestHub1.bioviz.org secret_key=lkdjkaejsdfkjfd repo=https://bitbucket.org/pbadzuh/track-hub-converter-webapp.git branch=IGBF-2903
            
            [hub_facade_hosts:vars]
            ansible_ssh_common_args="-o StrictHostKeyChecking=no"
            ansible_python_interpreter=/usr/bin/python3
            ansible_ssh_user=ubuntu
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Deploying the new track hub facade code on the testing site using playbooks with inventory file: [hub_facade_hosts] TestHub1 ansible_host=54.225.244.14 domain=TestHub1.bioviz.org secret_key=lkdjkaejsdfkjfd repo=https: //bitbucket.org/pbadzuh/track-hub-converter-webapp.git branch=IGBF-2903 [hub_facade_hosts:vars] ansible_ssh_common_args= "-o StrictHostKeyChecking=no" ansible_python_interpreter=/usr/bin/python3 ansible_ssh_user=ubuntu
            Hide
            ann.loraine Ann Loraine added a comment -

            I attempted to test the new track hub back end (branch IGBF-2803, see above comment) by entering the following URL into IGB as a quickload source, which is the example shown on "trackhubs.html" (see https://bioviztest.bioviz.org/trackhubs.html)

            testhub1.bioviz.org/rest_api/?hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt&fileName=/
            

            However, the contents.txt endpoint was broken and would not send the required data to IGB.

            The same error is seen when I deploy the latest "main" branch track hub facade back end.

            There appear to be several problems here, but I can't tell if they are related to the new branch code or not. The code itself looks fine, but I'm not able to deploy a working version.

            Moving back to "To-Do" in hopes Philip Badzuh can take a look tomorrow and figure out what could be going wrong.

            Show
            ann.loraine Ann Loraine added a comment - I attempted to test the new track hub back end (branch IGBF-2803 , see above comment) by entering the following URL into IGB as a quickload source, which is the example shown on "trackhubs.html" (see https://bioviztest.bioviz.org/trackhubs.html ) testhub1.bioviz.org/rest_api/?hubUrl=https: //genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt&fileName=/ However, the contents.txt endpoint was broken and would not send the required data to IGB. The same error is seen when I deploy the latest "main" branch track hub facade back end. There appear to be several problems here, but I can't tell if they are related to the new branch code or not. The code itself looks fine, but I'm not able to deploy a working version. Moving back to "To-Do" in hopes Philip Badzuh can take a look tomorrow and figure out what could be going wrong.
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            After some debugging, I realized that the change to IGB I proposed is no longer needed. I've deleted the associated branch. Kindly review only the hub facade changes.

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - After some debugging, I realized that the change to IGB I proposed is no longer needed. I've deleted the associated branch. Kindly review only the hub facade changes.
            Hide
            ann.loraine Ann Loraine added a comment -

            Changes look great. Thank you for adding documentation strings. Please submit PR when ready.

            Show
            ann.loraine Ann Loraine added a comment - Changes look great. Thank you for adding documentation strings. Please submit PR when ready.
            Hide
            pbadzuh Philip Badzuh (Inactive) added a comment -

            Thanks. Please see the PR here.

            Show
            pbadzuh Philip Badzuh (Inactive) added a comment - Thanks. Please see the PR here .
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged and re-deployed to test host. Branch was already tested. Moving to done.

            Show
            ann.loraine Ann Loraine added a comment - Merged and re-deployed to test host. Branch was already tested. Moving to done.

              People

              • Assignee:
                pbadzuh Philip Badzuh (Inactive)
                Reporter:
                pbadzuh Philip Badzuh (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: