Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2954

Investigate: why is loading bb (bigbed) and bigwig (bw) files so slow?

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      1.5
    • Sprint:
      Fall 4 2021 Sep 27 - Oct 8, Fall 5 2021 Oct 11 - Oct 22

      Description

      See linked issue for track hub facade data sources URL for the "JASPAR" Track Hub with multiple bigbed files.

      Loading these data into IGB is extremely slow for some reason.

      Bigbed format files contain indexes that map chromosome positions onto file byte positions, which let client software programs (like IGB) look up exactly which part of the larger file they need and then use HTTP byte range requests to retrieve just the needed portion of the file. Therefore it is surprising that loading this file takes a long time with IGB. Retrieving and reading the index part of the file should be very fast, and requesting the bytes ought to be fast, as well.

      For this task, investigate why it is so slow and suggest some ideas for how to speed it up.

      The bb (bigbed) format is commonly used in bioinformatics for distributing and visualizing data, so it is important that we do a better job of supporting it.

        Attachments

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -

            IGV does not show any data on the mitochondria chromosome (the sequence name should be ChrM, but the JASPAR2022_araTha1.bb file lists it as chrMt). IGB does show data for ChrM for the bed file version.

            Show
            nfreese Nowlan Freese added a comment - IGV does not show any data on the mitochondria chromosome (the sequence name should be ChrM, but the JASPAR2022_araTha1.bb file lists it as chrMt). IGB does show data for ChrM for the bed file version.
            Show
            nfreese Nowlan Freese added a comment - I'm not seeing a significant difference in the time to load the same test files between IGB and IGV. https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph.gz https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bedgraph.gz.tbi https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.bigwig https://data.cyverse.org/dav-anon/iplant/home/nfreese/2954_testing/quickload/A_thaliana_Jun_2009/SRR10060893.wig
            Hide
            ann.loraine Ann Loraine added a comment -

            Question: Does IGB's on-board (included in the jar) chromosome synonym file contain the Jasper synonym(s)?

            Show
            ann.loraine Ann Loraine added a comment - Question: Does IGB's on-board (included in the jar) chromosome synonym file contain the Jasper synonym(s)?
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Some of the JASPAR synonyms are part of the chromosomes.txt file. So, for example, when I load the bed file converted from the JASPAR file some of the data will load for some of the chromosomes (not chrCp as that is not in chromosomes.txt). Because none of the bigBed JASPAR data are able to load, this indicates to me that the bigBed logic in IGB is not using the chromosome synonyms logic.

            Show
            nfreese Nowlan Freese added a comment - - edited Some of the JASPAR synonyms are part of the chromosomes.txt file. So, for example, when I load the bed file converted from the JASPAR file some of the data will load for some of the chromosomes (not chrCp as that is not in chromosomes.txt). Because none of the bigBed JASPAR data are able to load, this indicates to me that the bigBed logic in IGB is not using the chromosome synonyms logic.
            Hide
            nfreese Nowlan Freese added a comment -

            I have created two new tickets (IGBF-2978 and IGBF-2979) to fix issues identified in this ticket.

            Closing this ticket.

            Show
            nfreese Nowlan Freese added a comment - I have created two new tickets ( IGBF-2978 and IGBF-2979 ) to fix issues identified in this ticket. Closing this ticket.

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: