Details

    • Story Points:
      3
    • Sprint:
      Spring 4 2021 May 3 - May 14, Spring 5 2021 May 17 - May 28

      Description

      Situation: When DAS support was restored in IGB additional genomes provided by UCSC were made available. However, many of these genomes appear oddly in the species or genome dropdown menus. This indicates these genomes may not be included in the synonyms.txt file.

      Task: Update the synonyms.txt file. Also make sure species listed in the main IGB QL site are represented in species.txt and synonyms.txt in the files that are packaged with IGB.

        Attachments

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -
            Show
            nfreese Nowlan Freese added a comment - Current UCSC DAS genomes: https://genome.ucsc.edu/cgi-bin/das/dsn Current UCSC API genomes: https://api.genome.ucsc.edu/list/ucscGenomes
            Hide
            nfreese Nowlan Freese added a comment -

            I compared the two endpoints and they do return the same genomes (as of May 14, 2021).

            Show
            nfreese Nowlan Freese added a comment - I compared the two endpoints and they do return the same genomes (as of May 14, 2021).
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The files that deal with creating and comparing synonyms can be found here in Bitbucket.

            These files are located in the Core - Synonym Lookup.

            Show
            nfreese Nowlan Freese added a comment - - edited The files that deal with creating and comparing synonyms can be found here in Bitbucket . These files are located in the Core - Synonym Lookup.
            Hide
            nfreese Nowlan Freese added a comment -

            There is a test file for testing synonym lookup: SynonymLookupTest.java that can be found in Core - Synonym Lookup > Test Packages

            We may need to double-check that this file is fully testing the example synonyms file. I found an issue where a line in the synonyms.txt that had no synonym (it just contained the IGB friendly name - T_parvula_May_2012) was blocking genomes/synonyms located physically below it in the file from being added to the thesaurus of synonyms in IGB.

            Show
            nfreese Nowlan Freese added a comment - There is a test file for testing synonym lookup: SynonymLookupTest.java that can be found in Core - Synonym Lookup > Test Packages We may need to double-check that this file is fully testing the example synonyms file. I found an issue where a line in the synonyms.txt that had no synonym (it just contained the IGB friendly name - T_parvula_May_2012) was blocking genomes/synonyms located physically below it in the file from being added to the thesaurus of synonyms in IGB.
            Show
            nfreese Nowlan Freese added a comment - Here are the links to the synonyms.txt and species.txt documentation I could find in the IGB User's Guide: https://wiki.transvar.org/display/igbman/Use+synonyms.txt+to+link+genome+version+names+to+each+other https://wiki.transvar.org/display/igbman/Personal+Synonyms https://wiki.transvar.org/display/lorainelab/How+to+add+a+new+synonym+to+IGB
            Show
            nfreese Nowlan Freese added a comment - Notes on SARS-CoV-2 https://pubmed.ncbi.nlm.nih.gov/32123347/ https://pubmed.ncbi.nlm.nih.gov/32344679/
            Hide
            nfreese Nowlan Freese added a comment -
            Show
            nfreese Nowlan Freese added a comment - Notes on ebola virus https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074192/
            Hide
            nfreese Nowlan Freese added a comment -

            I have pushed changes to my branch for review: https://bitbucket.org/nfreese/nowlanfork-igb/branch/IGBF-781

            To test:
            Download branch installer and run IGB
            In IGB, navigate to File > Preferences... and then select the Data Sources tab
            Check that the IGB Quickload and UCSC data sources are active
            In IGB, scroll through the Species dropdown menu in the Current Genome tab.
            Check that each of the species names are not abbreviated (for example Enhydra lutris nereis instead of enhLutNer)

            Two commits:
            I manually added or updated new UCSC genomes. If a species was not found in IGB it was added to species.txt. If a species was already in IGB but had a new genome version that was not present, the new genome version was added to the synonyms.txt.

            There were two edge cases where the synonyms.txt was not behaving as expected. This was due to the UCSC_REGEX pattern in SpeciesSynonymsLookupImpl.java assuming that the UCSC keys were a max of 6 letters and 1 number. However the UCSC keys for Cricetulus griseus (criGriChoV) and Enhydra lutris nereis (enhLutNer) are longer. I updated the regex to accept up to 10 characters and a number, though this assumption may eventually break as well.

            Show
            nfreese Nowlan Freese added a comment - I have pushed changes to my branch for review: https://bitbucket.org/nfreese/nowlanfork-igb/branch/IGBF-781 To test: Download branch installer and run IGB In IGB, navigate to File > Preferences... and then select the Data Sources tab Check that the IGB Quickload and UCSC data sources are active In IGB, scroll through the Species dropdown menu in the Current Genome tab. Check that each of the species names are not abbreviated (for example Enhydra lutris nereis instead of enhLutNer) Two commits: I manually added or updated new UCSC genomes. If a species was not found in IGB it was added to species.txt. If a species was already in IGB but had a new genome version that was not present, the new genome version was added to the synonyms.txt. There were two edge cases where the synonyms.txt was not behaving as expected. This was due to the UCSC_REGEX pattern in SpeciesSynonymsLookupImpl.java assuming that the UCSC keys were a max of 6 letters and 1 number. However the UCSC keys for Cricetulus griseus (criGriChoV) and Enhydra lutris nereis (enhLutNer) are longer. I updated the regex to accept up to 10 characters and a number, though this assumption may eventually break as well.
            Hide
            nfreese Nowlan Freese added a comment -

            I have created IGBF-2861 to investigate another edge case that was identified. In order to add the opossum genome (M_domestica) I needed to add a variety to the end (M_domestica_metatherian) to keep it distinct from M_domestica_Borkh.

            Show
            nfreese Nowlan Freese added a comment - I have created IGBF-2861 to investigate another edge case that was identified. In order to add the opossum genome (M_domestica) I needed to add a variety to the end (M_domestica_metatherian) to keep it distinct from M_domestica_Borkh.
            Hide
            nfreese Nowlan Freese added a comment -

            Testing of the IGB Genomes (genome dashboard)
            The genome dashboard accesses the species.txt and synonyms.txt files from bitbucket (see here), as well as the quickloads that are specified in the igbDefaultPrefs.json file on bitbucket.

            We will need to conduct post-merge testing to determine how the genome dashboard behaves with genomes that are only provided by UCSC DAS, of which there are now many. There could be complications if the UCSC DAS data source is disabled by default, as clicking on a genome provided only by UCSC DAS in the genome dashboard would be unable to load in IGB.

            Show
            nfreese Nowlan Freese added a comment - Testing of the IGB Genomes (genome dashboard) The genome dashboard accesses the species.txt and synonyms.txt files from bitbucket (see here ), as well as the quickloads that are specified in the igbDefaultPrefs.json file on bitbucket. We will need to conduct post-merge testing to determine how the genome dashboard behaves with genomes that are only provided by UCSC DAS, of which there are now many. There could be complications if the UCSC DAS data source is disabled by default, as clicking on a genome provided only by UCSC DAS in the genome dashboard would be unable to load in IGB.
            Hide
            omarne Omkar Marne (Inactive) added a comment - - edited

            Tested the branch installer mentioned by Dr. Nowlan Freese. The species names aren't abbreviated. They are in their complete name.

            Dr. Nowlan Freese, you can go ahead and submit the pull request.

            Show
            omarne Omkar Marne (Inactive) added a comment - - edited Tested the branch installer mentioned by Dr. Nowlan Freese . The species names aren't abbreviated. They are in their complete name. Dr. Nowlan Freese , you can go ahead and submit the pull request.
            Show
            nfreese Nowlan Freese added a comment - Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/879/igbf-781
            Hide
            ann.loraine Ann Loraine added a comment -

            Modifying bioviz-playbooks "replace" role to edit genome dashboard util.js file during deployment to also use user-specified IGB repository and branch.

            Show
            ann.loraine Ann Loraine added a comment - Modifying bioviz-playbooks "replace" role to edit genome dashboard util.js file during deployment to also use user-specified IGB repository and branch.
            Hide
            ann.loraine Ann Loraine added a comment -

            Deployed genome dashboard to bioviztest.bioviz.org using the proposed fork and branch. Confirmed that utils.js contains the following variables:

            const SPECIES = 'https://bitbucket.org/nfreese/nowlanfork-igb/IGBF-781/core/synonym-lookup/src/main/resources/species.txt';
            const SYNONYMS = 'https://bitbucket.org/nfreese/nowlanfork-igb/raw/IGBF-781/core/synonym-lookup/src/main/resources/synonyms.txt';
            const ALLQUICKLOADS = 'https://bitbucket.org/nfreese/nowlanfork-igb/raw/IGBF-781/core/igb-preferences/src/main/resources/igbDefaultPrefs.json';
            

            I see no difference between the genome dashboard deployed above and the version deployed on bioviz.org (main site). The only thing that looks a bit weird is that both sites don't have a picture for genome F_albicollis_Jun_2013 (collared flycatcher). This was added recently to IGB Quickload svn repository, and we (mainly me) forgot to add a photo to the genome dashboard repository. This has nothing to do with the current issue however.

            Based on the above review, I think it is fine to now merge the pull request.

            Show
            ann.loraine Ann Loraine added a comment - Deployed genome dashboard to bioviztest.bioviz.org using the proposed fork and branch. Confirmed that utils.js contains the following variables: const SPECIES = 'https: //bitbucket.org/nfreese/nowlanfork-igb/IGBF-781/core/synonym-lookup/src/main/resources/species.txt'; const SYNONYMS = 'https: //bitbucket.org/nfreese/nowlanfork-igb/raw/IGBF-781/core/synonym-lookup/src/main/resources/synonyms.txt'; const ALLQUICKLOADS = 'https: //bitbucket.org/nfreese/nowlanfork-igb/raw/IGBF-781/core/igb-preferences/src/main/resources/igbDefaultPrefs.json'; I see no difference between the genome dashboard deployed above and the version deployed on bioviz.org (main site). The only thing that looks a bit weird is that both sites don't have a picture for genome F_albicollis_Jun_2013 (collared flycatcher). This was added recently to IGB Quickload svn repository, and we (mainly me) forgot to add a photo to the genome dashboard repository. This has nothing to do with the current issue however. Based on the above review, I think it is fine to now merge the pull request.
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged and built master branch installers. Ready for testing.

            Show
            ann.loraine Ann Loraine added a comment - Merged and built master branch installers. Ready for testing.
            Hide
            nfreese Nowlan Freese added a comment -

            Tested on Mac master installer.

            Working correctly.

            Show
            nfreese Nowlan Freese added a comment - Tested on Mac master installer. Working correctly.
            Hide
            omarne Omkar Marne (Inactive) added a comment -

            Tested on Windows and Linux. The species names aren't abbreviated. They are in their complete name.

            Closing the issue.

            Show
            omarne Omkar Marne (Inactive) added a comment - Tested on Windows and Linux. The species names aren't abbreviated. They are in their complete name. Closing the issue.

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: