I've cloned Deepthi's branch for testing on Mac. To test her changes (as well as the functionality of the new VCF parser with htsjdk in general), I loaded various VCF files into IGB 10.1.0 and IGB 10.2.0 with Deepthi's changes simultaneously. Here's what I found:
Relating to this ticket:
Commit issue: The two newest commits on Deepthi's branch don't look quite right. I can't see the specific changes that were made because full files seem to have been deleted and then added back again. Please take a look at these commits and resubmit them so that we can look at code changes.
Malformed VCF Header issue: This new error is popping up as expected when loading space-separated VCF files via URL and locally! However,
adding a space-separated VCF file via QL throws this error many, many times, which should not be happening.
- To recreate this issue, follow the instructions at this link to add the smoke testing QL to IGB: https://wiki.bioviz.org/confluence/display/ITD/File+Formats
- Open the H_Sapien_Dec_2013 genome
- Try adding one of the VCF files from the Available Data section, then observe all of the repetitive errors being thrown.
Not relating to this ticket, specifically:
Data loading issue: Loading VCF files, especially those of increasing size, takes longer in IGB 10.2.0 with Deepthi's change when compared to loading those same files in 10.1.0. For example:
- Download the 1KG.chr22.anno.infocol.vcf.gz file from this website: https://bioinformaticstools.mayo.edu/research/vcf-miner-sample-vcfs/.
- Then, with both of the aforementioned versions of IGB open at once, compare the time it takes to load these files into IGB, as well as the time it takes to load data at a gene.
- Notice that loading in the file takes roughly the same amount of time between versions, but only IGB 10.2.0 runs out of memory while trying to load data. This indicates to me that the new VCF parsing logic attempts to download the entire file's worth of data when the Load Data button is clicked, which differs from previous IGB logic that only downloads whatever data is in frame.
Index issue: An error is being thrown whenever I attempt to load a VCF file with a tabix index into IGB 10.2.0 with Deepthi's changes. These same files work fine in 10.1.0. To recreate this, simply add the Genome in a Bottle VCF file to IGB via URL (or download it and add it locally), then observe the error below:
11:50:34.037 ERROR c.a.genometry.thread.CThreadWorker - class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030)
at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009)
at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73)
at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
11:50:34.720 ERROR c.a.igb.view.load.GeneralLoadUtils - null
java.util.concurrent.ExecutionException: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at java.desktop/javax.swing.SwingWorker.get(SwingWorker.java:612)
at com.affymetrix.igb.view.load.GeneralLoadUtils$3.finished(GeneralLoadUtils.java:1058)
at com.affymetrix.genometry.thread.CThreadWorker.done(CThreadWorker.java:51)
at java.desktop/javax.swing.SwingWorker$4.run(SwingWorker.java:749)
at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:847)
at java.desktop/sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112)
at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:857)
at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311)
at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243)
at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318)
at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:87)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
Caused by: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030)
at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009)
at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73)
at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
This may be contributing to the data loading issue above, because the index file can help IGB find the data it needs to load much quicker instead of having to load and search through the whole file.
Suggestion:
Since there are some issues with speed, I recommend rebasing the branch onto main and pushing the branch to the team repository so that we do not lose track of this work. But I think we might want to table this for deeper a look in the future?