Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      4
    • Sprint:
      Fall 1, Spring 3, Spring 5, Spring 6, Spring 8, Summer 2, Summer 3, Summer 4, Summer 6, Fall 2

      Description

      Situation: Current implementation is failing when trying to load VCF from Quickload

      Issue eloboration:
      New implementation is using VCFFileReader not lineReader, which can process only local files not any kind of URL (http)

      UPDATE: The issue was with loading VCF file via URL. I do not think the issue was due to the file being part of a Quickload.

        Attachments

        1. Screenshot 2025-06-26 at 1.50.51 PM.png
          389 kB
          saideepthi jagarapu
        2. vcf_fileerror_popup.png
          77 kB
          saideepthi jagarapu

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Note: see genome in a bottle data set (link is in the linked ticket).

            Show
            ann.loraine Ann Loraine added a comment - Note: see genome in a bottle data set (link is in the linked ticket).
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited

            Main branch (old logic, not using htsjdk/VCFFileReader):

            • You can load the VCF from a URL, even if the header is not perfectly tab-delimited. (tab-delimited means headers are seperated by \t instead of spaces)

            Current branch (using htsjdk/VCFCodec/VCFFileReader):

            • You get errors about "malformed header" or "not enough columns" when loading the same file.
              This is because htsjdk is strict: it requires VCF headers and data lines to be tab-delimited, as per the VCF specification.

            attn Nowlan Freese
            IGV also failing to lod https://quickload-testing.s3.amazonaws.com/smokeTestingQuickload/H_sapiens_Dec_2013/VCF/VCF_HomoSapien.vcf, giving an error pop-up (screenshot attached)

            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited Main branch (old logic, not using htsjdk/VCFFileReader): You can load the VCF from a URL, even if the header is not perfectly tab-delimited. (tab-delimited means headers are seperated by \t instead of spaces) Current branch (using htsjdk/VCFCodec/VCFFileReader): You get errors about "malformed header" or "not enough columns" when loading the same file. This is because htsjdk is strict: it requires VCF headers and data lines to be tab-delimited, as per the VCF specification. attn Nowlan Freese IGV also failing to lod https://quickload-testing.s3.amazonaws.com/smokeTestingQuickload/H_sapiens_Dec_2013/VCF/VCF_HomoSapien.vcf , giving an error pop-up (screenshot attached)
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited

            Testing update with htsjdk version of vcf implementation:
            Drag and dropping a vcf file with spaces - gave error as expected "Unable to parse header with error: Your input file has a malformed header: there are not enough columns present in the header line"
            Drag and dropping a vcf file with tabs worked fine.

            Reference files:

            VCF File with spaces: https://quickload-testing.s3.amazonaws.com/smokeTestingQuickload/H_sapiens_Dec_2013/VCF/VCF_HomoSapien.vcf
            VCF File with tabs: https://data.cyverse.org/dav-anon/iplant/home/nowlanf/testFiles/VCF_HomoSapien.vcf - chr1:1-1,128,778

            For testing local file, files from these url can be downloaded and can be used for testing as local files

            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited Testing update with htsjdk version of vcf implementation: Drag and dropping a vcf file with spaces - gave error as expected "Unable to parse header with error: Your input file has a malformed header: there are not enough columns present in the header line" Drag and dropping a vcf file with tabs worked fine. Reference files: VCF File with spaces: https://quickload-testing.s3.amazonaws.com/smokeTestingQuickload/H_sapiens_Dec_2013/VCF/VCF_HomoSapien.vcf VCF File with tabs: https://data.cyverse.org/dav-anon/iplant/home/nowlanf/testFiles/VCF_HomoSapien.vcf - chr1:1-1,128,778 For testing local file, files from these url can be downloaded and can be used for testing as local files
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment -

            To do: Create a pop-up which says about the error when we try to upload a different formatted vcf file (like space seperated file instead of tab seperated file).

            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - To do: Create a pop-up which says about the error when we try to upload a different formatted vcf file (like space seperated file instead of tab seperated file).
            Hide
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited

            Done with the popup implementation for wrongly formatted vcf files. Added relevant screenshot in attachments section.

            Code changes branch: https://bitbucket.org/lorainelab-deepthi/integrated-genome-browser/branch/IGBF-4106

            Testing guidelines:

            • Implemented logic works only for vcf files which are tab-separated, make sure you are using right format vcf files else it would throw error popup.
            • You could make use of space separated vcf files for testing failure case.
            Show
            sjagarap saideepthi jagarapu (Inactive) added a comment - - edited Done with the popup implementation for wrongly formatted vcf files. Added relevant screenshot in attachments section. Code changes branch: https://bitbucket.org/lorainelab-deepthi/integrated-genome-browser/branch/IGBF-4106 Testing guidelines: Implemented logic works only for vcf files which are tab-separated, make sure you are using right format vcf files else it would throw error popup. You could make use of space separated vcf files for testing failure case.
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            I've cloned Deepthi's branch for testing on Mac. To test her changes (as well as the functionality of the new VCF parser with htsjdk in general), I loaded various VCF files into IGB 10.1.0 and IGB 10.2.0 with Deepthi's changes simultaneously. Here's what I found:

            Relating to this ticket:
            Commit issue: The two newest commits on Deepthi's branch don't look quite right. I can't see the specific changes that were made because full files seem to have been deleted and then added back again. Please take a look at these commits and resubmit them so that we can look at code changes.
            Malformed VCF Header issue: This new error is popping up as expected when loading space-separated VCF files via URL and locally! However,
            adding a space-separated VCF file via QL throws this error many, many times, which should not be happening.

            1. To recreate this issue, follow the instructions at this link to add the smoke testing QL to IGB: https://wiki.bioviz.org/confluence/display/ITD/File+Formats
            2. Open the H_Sapien_Dec_2013 genome
            3. Try adding one of the VCF files from the Available Data section, then observe all of the repetitive errors being thrown.

            Not relating to this ticket, specifically:
            Data loading issue: Loading VCF files, especially those of increasing size, takes longer in IGB 10.2.0 with Deepthi's change when compared to loading those same files in 10.1.0. For example:

            1. Download the 1KG.chr22.anno.infocol.vcf.gz file from this website: https://bioinformaticstools.mayo.edu/research/vcf-miner-sample-vcfs/.
            2. Then, with both of the aforementioned versions of IGB open at once, compare the time it takes to load these files into IGB, as well as the time it takes to load data at a gene.
            3. Notice that loading in the file takes roughly the same amount of time between versions, but only IGB 10.2.0 runs out of memory while trying to load data. This indicates to me that the new VCF parsing logic attempts to download the entire file's worth of data when the Load Data button is clicked, which differs from previous IGB logic that only downloads whatever data is in frame.

            Index issue: An error is being thrown whenever I attempt to load a VCF file with a tabix index into IGB 10.2.0 with Deepthi's changes. These same files work fine in 10.1.0. To recreate this, simply add the Genome in a Bottle VCF file to IGB via URL (or download it and add it locally), then observe the error below:

            11:50:34.037 ERROR c.a.genometry.thread.CThreadWorker - class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
            java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
            	at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030)
            	at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009)
            	at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73)
            	at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305)
            	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
            	at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342)
            	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            	at java.base/java.lang.Thread.run(Thread.java:1583)
            11:50:34.720 ERROR c.a.igb.view.load.GeneralLoadUtils - null
            java.util.concurrent.ExecutionException: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
            	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
            	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
            	at java.desktop/javax.swing.SwingWorker.get(SwingWorker.java:612)
            	at com.affymetrix.igb.view.load.GeneralLoadUtils$3.finished(GeneralLoadUtils.java:1058)
            	at com.affymetrix.genometry.thread.CThreadWorker.done(CThreadWorker.java:51)
            	at java.desktop/javax.swing.SwingWorker$4.run(SwingWorker.java:749)
            	at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:847)
            	at java.desktop/sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112)
            	at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:857)
            	at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311)
            	at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243)
            	at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318)
            	at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773)
            	at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720)
            	at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714)
            	at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
            	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:87)
            	at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
            	at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
            	at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
            	at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
            	at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
            	at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
            	at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
            Caused by: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f)
            	at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030)
            	at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009)
            	at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73)
            	at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305)
            	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
            	at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342)
            	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            	at java.base/java.lang.Thread.run(Thread.java:1583)
            

            This may be contributing to the data loading issue above, because the index file can help IGB find the data it needs to load much quicker instead of having to load and search through the whole file.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - I've cloned Deepthi's branch for testing on Mac. To test her changes (as well as the functionality of the new VCF parser with htsjdk in general), I loaded various VCF files into IGB 10.1.0 and IGB 10.2.0 with Deepthi's changes simultaneously. Here's what I found: Relating to this ticket: Commit issue: The two newest commits on Deepthi's branch don't look quite right. I can't see the specific changes that were made because full files seem to have been deleted and then added back again. Please take a look at these commits and resubmit them so that we can look at code changes. Malformed VCF Header issue: This new error is popping up as expected when loading space-separated VCF files via URL and locally! However, adding a space-separated VCF file via QL throws this error many, many times, which should not be happening. To recreate this issue, follow the instructions at this link to add the smoke testing QL to IGB: https://wiki.bioviz.org/confluence/display/ITD/File+Formats Open the H_Sapien_Dec_2013 genome Try adding one of the VCF files from the Available Data section, then observe all of the repetitive errors being thrown. Not relating to this ticket, specifically: Data loading issue: Loading VCF files, especially those of increasing size, takes longer in IGB 10.2.0 with Deepthi's change when compared to loading those same files in 10.1.0. For example: Download the 1KG.chr22.anno.infocol.vcf.gz file from this website: https://bioinformaticstools.mayo.edu/research/vcf-miner-sample-vcfs/ . Then, with both of the aforementioned versions of IGB open at once, compare the time it takes to load these files into IGB, as well as the time it takes to load data at a gene. Notice that loading in the file takes roughly the same amount of time between versions, but only IGB 10.2.0 runs out of memory while trying to load data. This indicates to me that the new VCF parsing logic attempts to download the entire file's worth of data when the Load Data button is clicked, which differs from previous IGB logic that only downloads whatever data is in frame. Index issue: An error is being thrown whenever I attempt to load a VCF file with a tabix index into IGB 10.2.0 with Deepthi's changes. These same files work fine in 10.1.0. To recreate this, simply add the Genome in a Bottle VCF file to IGB via URL (or download it and add it locally), then observe the error below: 11:50:34.037 ERROR c.a.genometry.thread.CThreadWorker - class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f) java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009) at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang. Thread .run( Thread .java:1583) 11:50:34.720 ERROR c.a.igb.view.load.GeneralLoadUtils - null java.util.concurrent.ExecutionException: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f) at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) at java.desktop/javax.swing.SwingWorker.get(SwingWorker.java:612) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.finished(GeneralLoadUtils.java:1058) at com.affymetrix.genometry.thread.CThreadWorker.done(CThreadWorker.java:51) at java.desktop/javax.swing.SwingWorker$4.run(SwingWorker.java:749) at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.run(SwingWorker.java:847) at java.desktop/sun.swing.AccumulativeRunnable.run(AccumulativeRunnable.java:112) at java.desktop/javax.swing.SwingWorker$DoSubmitAccumulativeRunnable.actionPerformed(SwingWorker.java:857) at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311) at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243) at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318) at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773) at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720) at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714) at java.base/java.security.AccessController.doPrivileged(AccessController.java:400) at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:87) at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90) Caused by: java.lang.ClassCastException: class com.affymetrix.genometry.symloader.VCFSymLoaderTabix cannot be cast to class com.affymetrix.genometry.quickload.QuickLoadSymLoader (com.affymetrix.genometry.symloader.VCFSymLoaderTabix and com.affymetrix.genometry.quickload.QuickLoadSymLoader are in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @6e2d940f) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1030) at com.affymetrix.igb.view.load.GeneralLoadUtils$3.runInBackground(GeneralLoadUtils.java:1009) at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:305) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:342) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang. Thread .run( Thread .java:1583) This may be contributing to the data loading issue above, because the index file can help IGB find the data it needs to load much quicker instead of having to load and search through the whole file.
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestion:

            Since there are some issues with speed, I recommend rebasing the branch onto main and pushing the branch to the team repository so that we do not lose track of this work. But I think we might want to table this for deeper a look in the future?

            Show
            ann.loraine Ann Loraine added a comment - Suggestion: Since there are some issues with speed, I recommend rebasing the branch onto main and pushing the branch to the team repository so that we do not lose track of this work. But I think we might want to table this for deeper a look in the future?

              People

              • Assignee:
                sjagarap saideepthi jagarapu (Inactive)
                Reporter:
                sjagarap saideepthi jagarapu (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: