Details
-
Type:
New Plugin
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: 9.1.8 Major Release
-
Labels:
-
Story Points:3
-
Epic Link:
-
Sprint:Fall 6 Nov 30 - Dec 11, Fall 7 Dec 14 - Dec 23, Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19, Spring 1 2021 Mar 22 - Apr 2
Description
IGB makes it possible for users to visualize "custom genomes," which are genomes where they have used their own reference genome and visualized their own data files.
Please see this User's Guide page for guidance on how this features works from the user's perspective:
https://wiki.transvar.org/pages/viewpage.action?pageId=23200542
Unfortunately, each time a user needs to use their "custom genome," they have to repeat the entire process described in the wiki.
Instead of making them do this every time, let's implement a way for them to save their custom genome between sessions.
How this could work:
- Users will do the same thing as now when opening their custom genome.
- We will create a new submenu option under the "File" menu that says "Save Custom Genome to Local Quickload Site"
- When they choose that option, a popup will appear that asks the user to select a folder in the file system
- Once they do that, IGB creates a quickload site consisting of meta-data files only - contents.txt, genome.txt, annots.xml
- The annots.xml file tag "file" attribute will point to the local (or remote) files the user has loaded into IGB
Note that thanks to the new "reference" file tag implemented in IGB 9.1.6, the reference sequence no longer has to reside in the same folder as the Quickload site, as was previously the case.
Attachments
- annots.xml
- 0.4 kB
Issue Links
- blocks
-
IGBF-2805 Describe how to test the feature of adding and using a local quickload on Windows, Mac, Linux to Release Testing.
-
- Closed
-
- relates to
-
IGBF-2737 Test Quickload Saver App on Windows, Mac and Linux
-
- Closed
-
-
IGBF-2738 Review workflow for installing Quickload Saver App using App Store
-
- Closed
-
-
IGBF-2785 Update POM to include micro version
-
- Closed
-
-
IGBF-3108 Release Create Quickload App on App Store
-
- Closed
-
-
IGBF-3206 Investigate: In Quickload Saver, saving custom quickload does not use native file chooser
-
- To-Do
-
-
IGBF-2783 Improve Open Genome from File UI
-
- Closed
-
-
IGBF-2804 Output species.txt in the Quickload Saver app
-
- Closed
-
-
IGBF-2822 Edit don't over-write contents.txt in Quickload Saver
-
- To-Do
-
Activity
Yes, that's correct.
[~aloraine] - Should the meta-data files be created or they are already getting created somewhere when the custom genome file is loaded?
I don't see them getting created in the code.
Some suggestions:
- Check the IgbServices class. It may provide such a method.
- Also look at the "genome version" object. It may contain pointers to Quickload sites that support it.
Yes, the metadata files are not currently getting created by IGB. Part of the new feature is that you would have to add some new code to create these files. It would require gathering information from in-memory data objects within IGB. Probably the hard part would be figuring how to identify and then interrogate those objects.
Also, I think it would be super-cool — and cleaner design — to implement this as an entirely new plugin (within IGB) or, even better, as an external App, as you did with the soft clips depth graph App.
I'm noticing some additional issues with loading custom genomes that we may also want to address.
Adding a custom genome and then saving an IGB session and attempting to load that same session leads to a null pointer exception. A similar exception occurs with bookmarks. I would think that the session could easily save the location of the data (the file location is already included in the session xml). I think this error would be very confusing to users as there is no feedback when they try to load the custom genome session xml file.
16:52:03.668 ERROR c.a.i.b.BookmarkUnibrowControlServlet - Error while loading bookmark.
java.lang.NullPointerException: null
at com.affymetrix.igb.bookmarks.BookmarkUnibrowControlServlet.lambda$loadServers$11(BookmarkUnibrowControlServlet.java:456) ~[na:na]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_241]
at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_241]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_241]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[na:1.8.0_241]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[na:1.8.0_241]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_241]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_241]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_241]
at com.affymetrix.igb.bookmarks.BookmarkUnibrowControlServlet.loadServers(BookmarkUnibrowControlServlet.java:461) ~[na:na]
at com.affymetrix.igb.bookmarks.BookmarkUnibrowControlServlet.access$500(BookmarkUnibrowControlServlet.java:78) ~[na:na]
at com.affymetrix.igb.bookmarks.BookmarkUnibrowControlServlet$1.runInBackground(BookmarkUnibrowControlServlet.java:220) ~[na:na]
at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) [genometry-9.1.6.jar:na]
at javax.swing.SwingWorker$1.call(SwingWorker.java:295) [na:1.8.0_241]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_241]
at javax.swing.SwingWorker.run(SwingWorker.java:334) [na:1.8.0_241]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_241]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_241]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]
[~aloraine] - What tags should the annots.xml file contain?
I have a question regarding data containers object. There can be multiple data container objects within genome version object and each data container object contains a list of data set objects.
In the annots.xml file, will there be as many file tags as the number of dataset objects?
Regarding the tags:
The XML root element is "files" and the only child elements are "file" tags. A "file" tag can have many different attributes. What would be ideal is: When a user tries to save their custom genome as a quickload site, the annots.xml files would capture all the configurations that the user has made. For the initial implementation, we can focus on just the required ones: "name" and "title"
Regarding the data containers object:
I think you are correct. Each data set object corresponds to a track in the display. These would correspond to data files that the user has opened into a track.
A quick comment on the session object mentioned by Dr. Freese:
The goal of saving a custom genome for future use would also be met by fixing the session saving and reloading feature. So I think we should also fix this problem, as well. I'm not sure which approach would be faster & easier: fixing the session saving feature or implementing the proposed new saving-of-custom-genome-as-quickload feature.
Regarding the bug with sessions: we can fix that under a different ticket.
Code Repository - https://bitbucket.org/noorzahara/save_customegenome/src/master/
To install the app follow - https://wiki.transvar.org/display/ITD/App+Manager and once the app is installed, you should be viewing a new file menu option named - Save Custom Genome to Local Quickload Site
To test -
1. First create custom genome by following https://wiki.transvar.org/pages/viewpage.action?pageId=23200542
2. Load the sequence
3. Try saving the genome to local quickload site by clicking on the new file menu option
A quick request:
When saving the files to a location on the local file system, avoid hard-coding the file separator character. For example, on MacOS, we use "/" (forward slash) as the file separator. On Windows, however, we use "\" (backward slash) as the file separator. Probably java has some kind of object (System?) that returns the file separator character for a user's system. This change may be needed before testing on Windows.
A not-so-quick request:
I think it would be better to use a proper XML writing library. This ensures that if a value needing to be written as an attribute which happens to contain a character used as part of the XML "code" (e.g., greater-than, less-than, or ampersand ) will get properly encoded into a so-called XML "character entity" (e.g., > or < or &). (Please click "edit" to observe that the preceding characters are actually implemented using these entity encodings.)
One way to do this would be to create an XML "dom" object and then use an xml writer to write out the data. Making this improvement could be done as a separate ticket or under this one, as you see fit.
Nowlan Freese - would you give it a try when convenient?
Couple of issues:
1) The plugin is saving the Genome Version but not the Species in the contents.txt.
2) The annots.xml does not appear to be formatted correctly.
3) The annots.xml is setting all files with reference=true (such as a bed annotation file).
4) When selecting Save Custom Genome in the menu it does not use the native file chooser.
I think we should change the app's name to "Create Quickload". The app makes it much easier for users to create their own quickloads, or even just to generate the various files (such as genome.txt).
Thanks for the quick review! We do need to address items 1 through 3 since those seem like blockers. Once those are done, we can work on 4. Changing the name of the menu item is fine I think.
Fixing the XML formatting will likely be super easy once the code uses an XML library of some type. I think there might already be something like this in the IGB code base already.
Changes as per review comments - https://bitbucket.org/noorzahara/save_customegenome/commits/f9723c999b5cbfb8cd9758fe173d3ec7b8119311
Except for 1, I have tried to resolve the rest of the issues.
Quick request for Noor Zahara:
Please check the link above. It did not open the intended thing.
Sorry about that. Corrected the link.
I saved QL and then tried to open it, but I did not see the new QL site available.
What I did:
- Started IGB 9.1.6
- Installed the App
- Opened Arabidopsis genome (version A_thaliana_Jun_2009)
- Selected File > Save Custom Genome to Local Quickload Site
- Using the file dialog, selected a local folder to save the quickload site
- Used the "Configure" (next to "Available Data") to add the folder as a new local quickload site
The following message appeared in the "log" file:
16:20:34.672 INFO o.l.i.q.QuickloadDataProvider - Initializing Quickload Server file:/Users/aloraine/Desktop/ExampleQL/ 16:20:34.672 WARN o.l.i.quickload.util.QuickloadUtils - Optional quickload synonyms.txt file could not be loaded from file:/Users/aloraine/Desktop/ExampleQL/synonyms.txt 16:20:34.672 WARN o.l.i.quickload.util.QuickloadUtils - Optional species.txt could not be loaded from: file:/Users/aloraine/Desktop/ExampleQL/species.txt
This looks fine!
However, I did not see a new folder appear in the Available Data section, which it should have done. Instead, a new genome version appeared in the "Genome Version" menu for the species Arabidopsis thaliana. Looks like maybe the contents.txt file is incorrectly using a space character instead of a tab character to separate the two columns?
A request for Noor Zahara:
Can you please add the remaining attributes to the file tags from https://wiki.transvar.org/display/igbman/About+annots.xml?
The top priority ones are attributes related to track appearance, e.g., foreground and background color. Also, if the user has changed the values to something new within IGB, the quickload site created should use the new, user-selected values. I think this will likely happen "for free" based on how you have accessed the data within IGB. Please give it a try.
For your reference, please see "annots.xml" that got created. Note how this App has done something pretty cool, which is form a list of all the data files available for a genome.
cc: Nowlan Freese
Please note: genome.txt and contents.txt must both be tab-delimited. If they are not, the genome will not be shown. It looks like neither contents.txt nor genome.txt is tab-delimited.
I have added the code to include the other attributes. All the attributes are specified in PropertyConstants file.
Just a quick followup: I'm looking for the other attributes in the annots.xml file that was created, but I am not seeing them. Are you referring to a new build of the app?
Looks like the code I have added has some minor issue. Will fix and push the change along with the other change related to quickload site not appearing under Available Data section.
I have pushed the fix. https://bitbucket.org/noorzahara/save_customegenome/commits/4848b6c6d0ce4600e8ea63661e5f5ea8076fc32c
Error when trying to re-load a custom genome from a newly created quickload site:
"The feature file:/Users/aloraine/Desktop/MyQuickload/A_madeup_genome_2020/file:/Users/aloraine/src/genomes/quickload/A_thaliana_Jun_2009/Araport11.bed.gz is not reachable. More information about what went wrong may be available in the Console. To get help, visit the IGB Help Page."
Please see attached annots.xml file, which gives the full absolute path to the file on my file system.
Error when I try to load sequence data:
A genome sequence has not been selected. Loading sequence data is not possible. More information about what went wrong may be available in the Console. To get help, visit the IGB Help Page.
Contents.txt seems to have a problem also. There needs to be a value in the second column.
Also, when I created an entirely novel custom genome, I specified the species in the dialog that is shown. I think that for the quickload to work in the best way possible, it should have a "species.txt" file to specify the binomial name of the organism.
[~aloraine] - I have fixed the URI issue in IGB
Code diff - https://bitbucket.org/noorzahara/integrated-genome-browser-local1/branch/IGBF-510#diff
Please submit PR for branch IGBF-510
Above PR is merged and master branch installers are being made. Master branch installers are made and have been deployed to the "early access" IGB page.
Moved back to "to-do" and then forward to "in progress" since more work is being done on the app side.
[~aloraine] - I might have to export com.affymetrix.igb.tiers in core igb pom.xml in order to access style object. Shall I add it in IGBF-510 branch?
Please make it a new branch as I've just now merged the previous PR.
Another possible solution:
You could add a method to the IGBServices class that returns a style object when passed a reference to a track or unique identifier for a track, such as a dataset URI.
See: core/igb-services-api/src/main/java/org/lorainelab/igb/services/IgbService.java
I have added a method in IgbService.java to access style object
Code diff - https://bitbucket.org/noorzahara/integrated-genome-browser-local1/branch/IGBF-510-1#diff
Currently the dependency used to access IgbService.java object is
<dependency>
<groupId>org.lorainelab.igb</groupId>
<artifactId>igb-services</artifactId>
<version>9.1.4</version>
<scope>provided</scope>
</dependency>
The above changes will be present in 9.1.8 right? Should I change the version of the above dependency to 9.1.8?
Good question. Yes, the app will need to declare a dependency on igb-services version 9.1.8 or higher. I do not think you will have any trouble getting it to build locally, since 9.1.8 artifacts will likely be installed in your local ".m2" directory. However, when you try to build in bitbucket, there might be a compile-time failure due to 9.1.8 artifacts not having not yet been released to our maven repository, if these are required for compilation. But once the above change to the IGBServices interface and implement are merged, I can make a release to loraine lab nexus repository.
I had another question related to dataset properties - What should I be doing when both dataset properties and track style object are set?
Note - Ideally I have to rely on track style object but there is one property - 'index' that is not getting set in track style object. I have to also depend on dataset properties.
If the same property is set in both dataset and track style, I think the track style setting should be used, as this will ensure that the user's manual configuration gets saved into the "annots.xml" file.
Also, before a dataset is actually loaded and a new track created, there will probably not be a "track style" object, so then the dataset would be essential.
Merged to master branch. The new 9.1.8 artifacts are now deployed to nexus.bioviz.org. Moving issue back to "In Progress".
We need expose a function from track style class.
Code diff - https://bitbucket.org/noorzahara/integrated-genome-browser-local1/branch/IGBF-510-2#diff
Note- These changes should also be deployed to nexus.
Merged and deployed to Nexus.
Master branch installers are also built and available for use with the new App.
I have a question related to load_hint attribute. Should the values of this attribute be one of - Don't Load, Auto, Manual, Genome ?
Also when should it be set to Whole Sequence ?
Excellent question. "load_hint" when saved to annots.xml should only take the value "Whole Sequence."
So if a dataset / data file (in IGB) has "load_hint" equal to any other value, then when you create the annots.xml "file" element for it, the element's "load_hint" attribute should not be specified at all. In other words, it should be absent for that particular file.
The load_hint if present in an annots.xml file should always be "Whole Sequence." But it should only be included if the load hint in the data set / file within IGB is set in this way. This "load hint" is what ensures that a reference gene model track loads as soon as the user selects a genome version.
Nowlan Freese - if you have time today, please try out the new version of the app. It requires latest 9.1.8 to be installed.
Testing on Mac on master branch Jar (my laptop would not allow master dmg to be opened).
1) Created a quickload while in the 2013 human genome. I then removed all other quickloads. Loading the human genome appears correctly, various data sources are working.
2) I created a custom genome with 2bit sequence and bed annotation. I then created a quickload. Loading the custom quickload mostly appears correctly. The color of the annotation track appeared correclty, but the load_hint (Whole Sequence) was not set. When I created the custom genome I entered:
Genome Version: E_unicornis_Apr_2017
Species: Equus unicornis
When I loaded the quickload the Species appeared as E_unicornis (not Equus unicornis) and the Genome Version was E_unicornis_Apr_2017.
Overall appears to be working correctly.
[~aloraine] - I'm unclear if the load_hint should be set and if the custom genome Species can be saved.
If user switches the "load hint" to "Whole Genome" in the Data Access panel, then that setting should be preserved when the user saves the quickload site to disk.
I fixed the issue with the load_hint not working correctly - the load hint is "Whole Sequence" but the value passed from IGB is GENOME.
https://bitbucket.org/nfreese/save_customgenome/commits/420cf8f82557bd9df48b8b0b507b1307d9c65555
[~aloraine] - I'm not sure where we are pushing these code changes (Noor's fork?).
I am unsure how to proceed on the second issue - the species name not appearing correctly in the contents.txt. From what I can tell, when a user generates a custom genome and provides a species name, that value is used within the dropdown menu, but then is not used again. The speciesName value is never applied to the genomeVersion dataContainer so when the create quickload plugin tries to find the value, it appears as null. I can correct this behavior by adding a call to setSpeciesName within the GeneralLoadUtils.java (see commit). I find it odd that the setSpeciesName() method is never called within the GeneralLoadUtils, and from searching the code is only called from LoadFileAction.java and WebLink.java. There is a lot of logic regarding comparing the speciesName/genomeVersion to the known synonyms list to attempt to determine a name, but this would not apply to a custom genome.
My commit does fix the issue, and the species name appears correctly within the contents.txt when creating the custom quickload. My concern is that there is some underlying reason that the speciesName user input is never applied, though it may have been an oversight.
After talking with Dr. Loraine, for the time being we will not move forward with the IGB code regarding the setSpeciesName() method. This could code be updated in the future, but would require additional testing as it may affect other logic.
Current task: Waiting for the Lorainelab bitbucket account to fork Noor's repo so that I can issue a pull request.
Fork is now available:
I recommend applying the new commit as we will be able to test it quite thoroughly.
Recommending PR.
Pull request for App: https://bitbucket.org/lorainelab/quickload-saver/pull-requests/1/fix-load_hint-whole-sequence
Note: I added the call to setSpecies() in the retrieveDataContainer method as this is where the genomeVersion is initially created. Note that this occurs for every genome available upon startup of IGB, based on the data sources available. As far as I can tell, the genomeVersion species had previously not been set.
I also found a potential issue with the app itself. When saving the custom quickload if the user selects to save the quickload on top of a previously created quickload it will confirm the overwrite (good?) and then it will add the new genome file to the other genome files (good?). This could allow a user to create a quickload with multiple genomes, however, it is overwriting the contents.txt each time. If the contents.txt were appended to it would allow a user to create a custom quickload with multiple genomes.
App PR is merged.
For the IGB PR, do please revise commit to include the Jira issue number IGBF-510 so that we can more easily understand why the code was changed here in future.
[~aloraine] - Fixed pull request for IGB: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/863/igbf-510-add-setspecies-when-load-genome
Merged. Master branch installers are built and ready for testing.
Rachel and I have found an issue with loading local quickloads on Windows. This requires further investigation.
UPDATE
I have created IGBF-2820 to address the issue described below with spaces in the path to the file.
When testing IGBF-510 Dr. Freese and I found that my Windows 10 machine was not loading local quickloads normally.
Today I created new copies of the E. unicornis files tested in IGBF-510,
expanded the annots.xml file based on the wiki,
and added a spicies.txt file.
It was not loading any genomic data in IGB 9.1.6 but was showing data from the contents.txt and genome.txt files.
Changing the direction of the / characters,
checking that tabs were used to separate text content,
moving the files within the local quickload,
or adding/removing content from the quickload files did not fix the problem.
I moved the local quickload folder from my two-word user name file path down to the C drive and it began working normally.
I then moved it up one level to "Program Files", a folder containing a space in the name, and the local quickload stopped working.
I moved it to another folder with a path not containing any spaces and the local quickload began working again.
Without spaces anywhere in the file path, local quickloads work on Windows 10.
Tested on both IGB 9.1.6 and 9.1.8.
Some updates:
IGBF-2537 code was released as part of 9.1.6 with the purpose of allowing 2bit files to be specified outside of the quickload itself. Testing in this issue used a remotely hosted quickload file, and was not tested locally. When I tested the 2bit file locally it did work using a relative file path, but did not work with an absolute filepath. This is due to logic on QuickloadDataProvider.java getSequenceFileUri method as part of this commit. The problem is that the original logic made many assumptions about where the 2bit file was. The IGBF-2537 commit above addressed some of that logic, but did not test for full path local 2bit files. Unfortunately, this issue (IGBF-510) relies on the full path to the 2bit file. I am working to address this logic.
Ready for testing
Branch: https://bitbucket.org/nfreese/nowlanfork-igb/branch/IGBF-510
To test:
Follow instructions here: https://wiki.bioviz.org/confluence/display/ITD/Quickload+Saver
Merged
Installers are built.
Kindly test by downloading and installing the master branch installers for Windows and MacOS from https://bitbucket.org/lorainelab/integrated-genome-browser/downloads/.
I tested the Quickload Saver app on Windows using the latest bitbucket installer for IGB as part of my testing for IGBF-2737. Everything worked as described in the testing documentation.
Chester Dias - kindly test on Linux platform.
To test:
Follow instructions here: https://wiki.bioviz.org/confluence/display/ITD/Quickload+Saver
Testing completed as per the wiki on Ubuntu 20.04.1 LTS version. The overall functionalities seem to work as expected except for a minor issue mentioned below.
When saving the custom genome locally, after navigating to the folder where we want to save it to if we simply press the save button of the dialog box without giving a name the dialog box closes , but creates a folder with a random name sometimes genome version is the name.
[~aloraine] - By creating a quickload, do you mean to just put all the 3 meta-data files in the folder the user chooses?