Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2954

Investigate: why is loading bb (bigbed) and bigwig (bw) files so slow?

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      1.5
    • Sprint:
      Fall 4 2021 Sep 27 - Oct 8, Fall 5 2021 Oct 11 - Oct 22

      Description

      See linked issue for track hub facade data sources URL for the "JASPAR" Track Hub with multiple bigbed files.

      Loading these data into IGB is extremely slow for some reason.

      Bigbed format files contain indexes that map chromosome positions onto file byte positions, which let client software programs (like IGB) look up exactly which part of the larger file they need and then use HTTP byte range requests to retrieve just the needed portion of the file. Therefore it is surprising that loading this file takes a long time with IGB. Retrieving and reading the index part of the file should be very fast, and requesting the bytes ought to be fast, as well.

      For this task, investigate why it is so slow and suggest some ideas for how to speed it up.

      The bb (bigbed) format is commonly used in bioinformatics for distributing and visualizing data, so it is important that we do a better job of supporting it.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Downloaded the slow-loading bb file:

            wget http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_araTha1.bb
            

            The file is more than 2 Gb in size.

            Show
            ann.loraine Ann Loraine added a comment - Downloaded the slow-loading bb file: wget http: //expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_araTha1.bb The file is more than 2 Gb in size.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Nowlan Freese notes that bigwig loading is also very slow while testing IGBF-2911.

            Show
            ann.loraine Ann Loraine added a comment - - edited Nowlan Freese notes that bigwig loading is also very slow while testing IGBF-2911 .
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Initial notes on loading bigwig files:

            Testing using branch release-9.1.8 on Netbeans 8.2.

            Attempting to load the file SRR10060893.rnaseq.bw locally works fine and is relatively fast. When I attempt to load the file from BioViz Connect it locks up IGB for several minutes.

            The problem appears to be occurring within the following line of code from BBFileReader.java:

            zoomLevels = new BBZoomLevels(fis, zoomLevelOffset, zoomLevelCount,
                                isLowToHigh, uncompressBufSize);
            
            Show
            nfreese Nowlan Freese added a comment - - edited Initial notes on loading bigwig files: Testing using branch release-9.1.8 on Netbeans 8.2. Attempting to load the file SRR10060893.rnaseq.bw locally works fine and is relatively fast. When I attempt to load the file from BioViz Connect it locks up IGB for several minutes. The problem appears to be occurring within the following line of code from BBFileReader.java: zoomLevels = new BBZoomLevels(fis, zoomLevelOffset, zoomLevelCount, isLowToHigh, uncompressBufSize);
            Hide
            ann.loraine Ann Loraine added a comment -

            Check for caching. If a file has to downloaded in its entirety in order to be displayed in IGB, the application may be saving it in the local cache.

            Show
            ann.loraine Ann Loraine added a comment - Check for caching. If a file has to downloaded in its entirety in order to be displayed in IGB, the application may be saving it in the local cache.
            Hide
            nfreese Nowlan Freese added a comment -

            Repository for bigwig and bigbed parsing: https://bitbucket.org/lorainelab/bigwig/src/master/

            Show
            nfreese Nowlan Freese added a comment - Repository for bigwig and bigbed parsing: https://bitbucket.org/lorainelab/bigwig/src/master/
            Hide
            nfreese Nowlan Freese added a comment -

            14:39:52.180 ERROR o.broad.igv.bbfile.BBZoomLevelFormat - Error reading zoom level data records (Table O)
            java.io.EOFException: null
            at htsjdk.samtools.seekablestream.SeekableStream.readFully(SeekableStream.java:86) ~[htsjdk-igb-2.16.3.jar:na]
            at org.broad.igv.bbfile.BBZoomLevelFormat.<init>(BBZoomLevelFormat.java:113) ~[bigwig-2.0.0.jar:na]
            at org.broad.igv.bbfile.BBZoomLevels.<init>(BBZoomLevels.java:115) [bigwig-2.0.0.jar:na]
            at org.broad.igv.bbfile.BBFileReader.<init>(BBFileReader.java:166) [bigwig-2.0.0.jar:na]
            at com.gene.bigwighandler.BigWigSymLoader.initbbReader(BigWigSymLoader.java:89) [bigWigHandler-9.1.10.jar:na]
            at com.gene.bigwighandler.BigWigSymLoader.init(BigWigSymLoader.java:62) [bigWigHandler-9.1.10.jar:na]
            at com.gene.bigwighandler.BigWigSymLoader.getChromosomeList(BigWigSymLoader.java:106) [bigWigHandler-9.1.10.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadAndAddSymmetries(QuickLoadSymLoader.java:153) [genometry-9.1.10.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadSymmetriesThread(QuickLoadSymLoader.java:139) [genometry-9.1.10.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadFeatures(QuickLoadSymLoader.java:119) [genometry-9.1.10.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils.loadFeaturesForSym(GeneralLoadUtils.java:752) [igb-9.1.10.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:694) [igb-9.1.10.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:689) [igb-9.1.10.jar:na]
            at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) [genometry-9.1.10.jar:na]
            at javax.swing.SwingWorker$1.call(SwingWorker.java:295) [na:1.8.0_241]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_241]
            at javax.swing.SwingWorker.run(SwingWorker.java:334) [na:1.8.0_241]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_241]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_241]
            at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]

            https://data.cyverse.org/dav-anon/iplant/home/shared/BioViz/rnaseq/A_thaliana_Jun_2009/SRP220157/graph_scaled/SRR10060904.rnaseq.bw

            Show
            nfreese Nowlan Freese added a comment - 14:39:52.180 ERROR o.broad.igv.bbfile.BBZoomLevelFormat - Error reading zoom level data records (Table O) java.io.EOFException: null at htsjdk.samtools.seekablestream.SeekableStream.readFully(SeekableStream.java:86) ~ [htsjdk-igb-2.16.3.jar:na] at org.broad.igv.bbfile.BBZoomLevelFormat.<init>(BBZoomLevelFormat.java:113) ~ [bigwig-2.0.0.jar:na] at org.broad.igv.bbfile.BBZoomLevels.<init>(BBZoomLevels.java:115) [bigwig-2.0.0.jar:na] at org.broad.igv.bbfile.BBFileReader.<init>(BBFileReader.java:166) [bigwig-2.0.0.jar:na] at com.gene.bigwighandler.BigWigSymLoader.initbbReader(BigWigSymLoader.java:89) [bigWigHandler-9.1.10.jar:na] at com.gene.bigwighandler.BigWigSymLoader.init(BigWigSymLoader.java:62) [bigWigHandler-9.1.10.jar:na] at com.gene.bigwighandler.BigWigSymLoader.getChromosomeList(BigWigSymLoader.java:106) [bigWigHandler-9.1.10.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadAndAddSymmetries(QuickLoadSymLoader.java:153) [genometry-9.1.10.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadSymmetriesThread(QuickLoadSymLoader.java:139) [genometry-9.1.10.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadFeatures(QuickLoadSymLoader.java:119) [genometry-9.1.10.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils.loadFeaturesForSym(GeneralLoadUtils.java:752) [igb-9.1.10.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:694) [igb-9.1.10.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:689) [igb-9.1.10.jar:na] at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) [genometry-9.1.10.jar:na] at javax.swing.SwingWorker$1.call(SwingWorker.java:295) [na:1.8.0_241] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_241] at javax.swing.SwingWorker.run(SwingWorker.java:334) [na:1.8.0_241] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_241] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_241] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241] https://data.cyverse.org/dav-anon/iplant/home/shared/BioViz/rnaseq/A_thaliana_Jun_2009/SRP220157/graph_scaled/SRR10060904.rnaseq.bw
            Hide
            nfreese Nowlan Freese added a comment -

            Link to public Quickload hosted on CyVerse:

            https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload

            Contains a bigBed, bed and tabix indexed bed file created from Araport11 on IGB Quickload. Also contains bedgraph, bigwig, wig, and tabix indexed bedgraph of the SRR10060893 bedgraph file from the BioViz CyVerse community folder.

            Show
            nfreese Nowlan Freese added a comment - Link to public Quickload hosted on CyVerse: https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload Contains a bigBed, bed and tabix indexed bed file created from Araport11 on IGB Quickload. Also contains bedgraph, bigwig, wig, and tabix indexed bedgraph of the SRR10060893 bedgraph file from the BioViz CyVerse community folder.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Testing in 9.1.8 release

            For files, use the above quickload hosted on CyVerse.
            In IGB
            Open the Arabidopsis thaliana genome A_thaliana_Jun_2009
            Navigate to Chr1:2,257,247-2,260,234
            Select bigBed, bed, bed tabix in the Available Data box

            I see no difference in loading times, with or without the cache enabled. I also don't see anything appearing in the cache, according to IGB.

            Select bedgraph, bedgraph tabix, bigfwig, wig

            On the initial loading of data, bedgraph tabix loads the fastest, followed by bigwig. bedgraph and wig take the longest. Unclear why bigwig takes longer than bedgraph tabix, as they are both indexed binary files. On subsequent loading of the files, all four files load in approximately the same amount of time, even with the cache disabled and the removal of the .igb folder.

            Could this indicate that the initial retrieval of chromosomes for the files is what takes the most amount of time?

            Show
            nfreese Nowlan Freese added a comment - - edited Testing in 9.1.8 release For files, use the above quickload hosted on CyVerse. In IGB Open the Arabidopsis thaliana genome A_thaliana_Jun_2009 Navigate to Chr1:2,257,247-2,260,234 Select bigBed, bed, bed tabix in the Available Data box I see no difference in loading times, with or without the cache enabled. I also don't see anything appearing in the cache, according to IGB. Select bedgraph, bedgraph tabix, bigfwig, wig On the initial loading of data, bedgraph tabix loads the fastest, followed by bigwig. bedgraph and wig take the longest. Unclear why bigwig takes longer than bedgraph tabix, as they are both indexed binary files. On subsequent loading of the files, all four files load in approximately the same amount of time, even with the cache disabled and the removal of the .igb folder. Could this indicate that the initial retrieval of chromosomes for the files is what takes the most amount of time?
            Hide
            nfreese Nowlan Freese added a comment -

            Regarding the JASPAR2022_araTha1.bb file referenced in a previous comment, I am seeing the following exception when attempting to load the file.

            java.lang.NullPointerException: null
            at com.gene.bigbedhandler.BigBedSymLoader.getRegion(BigBedSymLoader.java:138) ~[na:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.getRegion(QuickLoadSymLoader.java:287) ~[genometry-9.1.8.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadAndAddSymmetries(QuickLoadSymLoader.java:164) ~[genometry-9.1.8.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadSymmetriesThread(QuickLoadSymLoader.java:139) ~[genometry-9.1.8.jar:na]
            at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadFeatures(QuickLoadSymLoader.java:119) ~[genometry-9.1.8.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils.loadFeaturesForSym(GeneralLoadUtils.java:752) ~[igb-9.1.8.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:694) [igb-9.1.8.jar:na]
            at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:689) [igb-9.1.8.jar:na]
            at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) [genometry-9.1.8.jar:na]
            at javax.swing.SwingWorker$1.call(SwingWorker.java:295) [na:1.8.0_241]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_241]
            at javax.swing.SwingWorker.run(SwingWorker.java:334) [na:1.8.0_241]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_241]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_241]
            at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]

            Show
            nfreese Nowlan Freese added a comment - Regarding the JASPAR2022_araTha1.bb file referenced in a previous comment, I am seeing the following exception when attempting to load the file. java.lang.NullPointerException: null at com.gene.bigbedhandler.BigBedSymLoader.getRegion(BigBedSymLoader.java:138) ~ [na:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.getRegion(QuickLoadSymLoader.java:287) ~ [genometry-9.1.8.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadAndAddSymmetries(QuickLoadSymLoader.java:164) ~ [genometry-9.1.8.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadSymmetriesThread(QuickLoadSymLoader.java:139) ~ [genometry-9.1.8.jar:na] at com.affymetrix.genometry.quickload.QuickLoadSymLoader.loadFeatures(QuickLoadSymLoader.java:119) ~ [genometry-9.1.8.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils.loadFeaturesForSym(GeneralLoadUtils.java:752) ~ [igb-9.1.8.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:694) [igb-9.1.8.jar:na] at com.affymetrix.igb.view.load.GeneralLoadUtils$2.runInBackground(GeneralLoadUtils.java:689) [igb-9.1.8.jar:na] at com.affymetrix.genometry.thread.CThreadWorker.doInBackground(CThreadWorker.java:73) [genometry-9.1.8.jar:na] at javax.swing.SwingWorker$1.call(SwingWorker.java:295) [na:1.8.0_241] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_241] at javax.swing.SwingWorker.run(SwingWorker.java:334) [na:1.8.0_241] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_241] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_241] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]
            Hide
            nfreese Nowlan Freese added a comment -

            IGV 2.11.1 is able to load the data from http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_araTha1.bb

            I also downloaded the file and attempted to load in IGB - it failed with the same exception.

            Using the UCSC bigBedToBed converter, I converted the JASPAR2022_araTha1.bb file to a bed file, gzipped, and tabix indexed it. This bed file was able to load correctly in IGB without issue. I also ran the UCSC bigBedInfo on the JASPAR2022_araTha1.bb file, and did not receive any errors.

            This would indicate an issue with the IGB parsing of the file, though it is unclear why this bigBed fails to load while others load correctly.

            Show
            nfreese Nowlan Freese added a comment - IGV 2.11.1 is able to load the data from http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_araTha1.bb I also downloaded the file and attempted to load in IGB - it failed with the same exception. Using the UCSC bigBedToBed converter, I converted the JASPAR2022_araTha1.bb file to a bed file, gzipped, and tabix indexed it. This bed file was able to load correctly in IGB without issue. I also ran the UCSC bigBedInfo on the JASPAR2022_araTha1.bb file, and did not receive any errors. This would indicate an issue with the IGB parsing of the file, though it is unclear why this bigBed fails to load while others load correctly.
            Hide
            nfreese Nowlan Freese added a comment -

            The JASPAR2022_araTha1.bb file has different chromosome names than the A_thaliana_Jun_2009 genome in IGB. My best guess is that the bigBed parser in IGB is failing to account for the different chromosome names.

            chr1 0 30427671
            chr2 1 19698289
            chr3 2 23459830
            chr4 3 18585056
            chr5 4 26975502
            chrCp 5 154478
            chrMt 6 366924

            Show
            nfreese Nowlan Freese added a comment - The JASPAR2022_araTha1.bb file has different chromosome names than the A_thaliana_Jun_2009 genome in IGB. My best guess is that the bigBed parser in IGB is failing to account for the different chromosome names. chr1 0 30427671 chr2 1 19698289 chr3 2 23459830 chr4 3 18585056 chr5 4 26975502 chrCp 5 154478 chrMt 6 366924
            Hide
            nfreese Nowlan Freese added a comment -

            IGV does not show any data on the mitochondria chromosome (the sequence name should be ChrM, but the JASPAR2022_araTha1.bb file lists it as chrMt). IGB does show data for ChrM for the bed file version.

            Show
            nfreese Nowlan Freese added a comment - IGV does not show any data on the mitochondria chromosome (the sequence name should be ChrM, but the JASPAR2022_araTha1.bb file lists it as chrMt). IGB does show data for ChrM for the bed file version.
            Show
            nfreese Nowlan Freese added a comment - I'm not seeing a significant difference in the time to load the same test files between IGB and IGV. https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph.gz https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph.gz.tbi https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bigwig https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.wig
            Hide
            ann.loraine Ann Loraine added a comment -

            Question: Does IGB's on-board (included in the jar) chromosome synonym file contain the Jasper synonym(s)?

            Show
            ann.loraine Ann Loraine added a comment - Question: Does IGB's on-board (included in the jar) chromosome synonym file contain the Jasper synonym(s)?
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Some of the JASPAR synonyms are part of the chromosomes.txt file. So, for example, when I load the bed file converted from the JASPAR file some of the data will load for some of the chromosomes (not chrCp as that is not in chromosomes.txt). Because none of the bigBed JASPAR data are able to load, this indicates to me that the bigBed logic in IGB is not using the chromosome synonyms logic.

            Show
            nfreese Nowlan Freese added a comment - - edited Some of the JASPAR synonyms are part of the chromosomes.txt file. So, for example, when I load the bed file converted from the JASPAR file some of the data will load for some of the chromosomes (not chrCp as that is not in chromosomes.txt). Because none of the bigBed JASPAR data are able to load, this indicates to me that the bigBed logic in IGB is not using the chromosome synonyms logic.
            Hide
            nfreese Nowlan Freese added a comment -

            I have created two new tickets (IGBF-2978 and IGBF-2979) to fix issues identified in this ticket.

            Closing this ticket.

            Show
            nfreese Nowlan Freese added a comment - I have created two new tickets ( IGBF-2978 and IGBF-2979 ) to fix issues identified in this ticket. Closing this ticket.

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: