Details

    • Story Points:
      4
    • Sprint:
      Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10, Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14

      Description

      Situation: bigBed is an indexed binary format of bed files. However, in IGB, bigBed files are not appearing correctly. The bigBed gene annotations (see attached file) are not showing the introns/exons, but instead are drawing a single glyph for the entire gene.

      Task: Fix IGB so that the bigBed files appear correctly.

      See the UCSC guide on bigBed for more information on the bigBed file format.

        Attachments

        1. Araport11.bb
          3.14 MB
        2. bbVSbed.png
          bbVSbed.png
          149 kB
        3. bed.png
          bed.png
          195 kB
        4. bed and bb.png
          bed and bb.png
          80 kB
        5. bedDetail.as
          0.9 kB
        6. chrom.sizes
          0.1 kB

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -

            To reproduce the issue:

            Open IGB
            Open the Arabidopsis thaliana A_thaliana_Jun_2009 genome
            Navigate to Chr1:2,257,253-2,260,326
            Load the attached Araport11.bb file
            Click Load Data and compare the Araport11.bb file and the Araport11 bed file found in IGB Quickload (should load by default)

            Show
            nfreese Nowlan Freese added a comment - To reproduce the issue: Open IGB Open the Arabidopsis thaliana A_thaliana_Jun_2009 genome Navigate to Chr1:2,257,253-2,260,326 Load the attached Araport11.bb file Click Load Data and compare the Araport11.bb file and the Araport11 bed file found in IGB Quickload (should load by default)
            Hide
            nfreese Nowlan Freese added a comment - - edited

            As a sanity check, I viewed the Araport11.bb file in IGV (2.11.1) and it appeared correctly.

            I also converted the Araport11.bb file back into bed file and that bed file appeared correctly when viewed in IGB 9.1.8.

            I also checked previous versions of IGB going back to 8.2.0 and bigBed did not appear correctly in any of them.

            Show
            nfreese Nowlan Freese added a comment - - edited As a sanity check, I viewed the Araport11.bb file in IGV (2.11.1) and it appeared correctly. I also converted the Araport11.bb file back into bed file and that bed file appeared correctly when viewed in IGB 9.1.8. I also checked previous versions of IGB going back to 8.2.0 and bigBed did not appear correctly in any of them.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Here is the command I used to convert the Araport11.bed file found in the IGB Quickload to a bigBed file using the UCSC bedToBigBed converter:

            bedToBigBed -as=bedDetail.as -type=bed12+2 -tab Araport11.bed chrom.sizes Araport11.bb

            Note that the Araport11.bed file from IGB Quickload is a bed detail file and contains 14 columns. bigBed can by default handle 12 columns by specifying the -type=bed12 option. For a 14 column bedDetail file, -type=bed12+2 indicates two additional columns, which then must be defined in a .as file (see attached).

            Show
            nfreese Nowlan Freese added a comment - - edited Here is the command I used to convert the Araport11.bed file found in the IGB Quickload to a bigBed file using the UCSC bedToBigBed converter: bedToBigBed -as=bedDetail.as -type=bed12+2 -tab Araport11.bed chrom.sizes Araport11.bb Note that the Araport11.bed file from IGB Quickload is a bed detail file and contains 14 columns. bigBed can by default handle 12 columns by specifying the -type=bed12 option. For a 14 column bedDetail file, -type=bed12+2 indicates two additional columns, which then must be defined in a .as file (see attached).
            Hide
            nfreese Nowlan Freese added a comment -

            Much of the logic for identifying chromosomes is from BBFileReader.java provided by Broad. It appears we may be using an older version of this file.

            Show
            nfreese Nowlan Freese added a comment - Much of the logic for identifying chromosomes is from BBFileReader.java provided by Broad. It appears we may be using an older version of this file.
            Hide
            nfreese Nowlan Freese added a comment -

            While investigating the BBFileReader.java I decided to double-check to see if the issue with IGB failing to properly load the bigBed files was due to the older BBFileReader.java code. Going back to older versions of IGV (Version 2.1.30 from 12/11/2012) I found that IGV was able to properly load the bigBed file. Investigation of the version of BBFileReader.java currently used in IGB showed that the returned values do include the full bed information (all fields of the bed file including introns/exons). So this may be a case of the data being returned from BBFileReader.java not being parsed correctly by the IGB code. I have shifted my focus to identify on the IGB codebase.

            Show
            nfreese Nowlan Freese added a comment - While investigating the BBFileReader.java I decided to double-check to see if the issue with IGB failing to properly load the bigBed files was due to the older BBFileReader.java code. Going back to older versions of IGV (Version 2.1.30 from 12/11/2012) I found that IGV was able to properly load the bigBed file. Investigation of the version of BBFileReader.java currently used in IGB showed that the returned values do include the full bed information (all fields of the bed file including introns/exons). So this may be a case of the data being returned from BBFileReader.java not being parsed correctly by the IGB code. I have shifted my focus to identify on the IGB codebase.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The problem is in BigBedSymLoader.java in the parse method. The parse method is creating a SimpleSymWithProps where only the chromosome, start, stop, and strand are being used. The remaining bed fields are labeled as restOfFields and effectively ignored by IGB. I will need to determine a better way to parse this.

            Show
            nfreese Nowlan Freese added a comment - - edited The problem is in BigBedSymLoader.java in the parse method. The parse method is creating a SimpleSymWithProps where only the chromosome, start, stop, and strand are being used. The remaining bed fields are labeled as restOfFields and effectively ignored by IGB. I will need to determine a better way to parse this.
            Hide
            nfreese Nowlan Freese added a comment -

            The restOfFields array contains all of the remaining fields past the first three of the bed file, including "optional" fields (will need to test).
            So logic should be to just build a bed file using the defined field count, then if there are additional fields (fieldCount - definedFieldCount, these would need to be added on the end).
            I'm not sure if IGB allows for something like a bed 6 file plus optional fields, will need to look into this.

            Show
            nfreese Nowlan Freese added a comment - The restOfFields array contains all of the remaining fields past the first three of the bed file, including "optional" fields (will need to test). So logic should be to just build a bed file using the defined field count, then if there are additional fields (fieldCount - definedFieldCount, these would need to be added on the end). I'm not sure if IGB allows for something like a bed 6 file plus optional fields, will need to look into this.
            Hide
            ann.loraine Ann Loraine added a comment -

            Scrum discussion:
            Also look at bedgraph format and how we use it in IGB.
            Suggestion: Confirm that "bigbed" is always used to represent region-based data, i.e., annotations, and never graphs.
            Q: Are bedgraph files also bed? A: No.
            NF: Needs to re-write our bed parsing code a bit to adapt it to the different requirements of bigbed, which allows optional fields following the require fields.

            Show
            ann.loraine Ann Loraine added a comment - Scrum discussion: Also look at bedgraph format and how we use it in IGB. Suggestion: Confirm that "bigbed" is always used to represent region-based data, i.e., annotations, and never graphs. Q: Are bedgraph files also bed? A: No. NF: Needs to re-write our bed parsing code a bit to adapt it to the different requirements of bigbed, which allows optional fields following the require fields.
            Hide
            nfreese Nowlan Freese added a comment -

            I have finished the initial parsing work. The Araport11.bb file will now show correctly in IGB.

            Next steps are to test different bigBed files and to clean up the code.

            Show
            nfreese Nowlan Freese added a comment - I have finished the initial parsing work. The Araport11.bb file will now show correctly in IGB. Next steps are to test different bigBed files and to clean up the code.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            While testing I noticed that special characters are not handled in bigBed file names.

            For example, this will not work:
            myBigBed2_hg18_bed9+2.bb

            but this will:
            myBigBed2_hg18_bed9_2.bb

            Bed files do seem to handle special characters, for example this file works:
            myBigBed2_hg18_bed9+2.bed

            Show
            nfreese Nowlan Freese added a comment - - edited While testing I noticed that special characters are not handled in bigBed file names. For example, this will not work: myBigBed2_hg18_bed9+2.bb but this will: myBigBed2_hg18_bed9_2.bb Bed files do seem to handle special characters, for example this file works: myBigBed2_hg18_bed9+2.bed
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The newest version of the BBFileReader.java from the IGV github repo has the following method:

            public String getAutoSql()

            { return autoSql; }

            This should get the AutoSQL custom BigBed fields. These would allow us to set the field names for IGB appropriately.

            Unfortunately the version of BBFileReader.java that IGB currently uses is missing the getAutoSql() method and there does not appear to be a way to get the BigBed field names. We can still access the optional fields, we just don't know what the field name is. For example, we cannot determine if a bigBed file with 12 fields plus 2 optional fields is a bed detail file, as the 2 optional fields could be for ID and Description (i.e. bed detail) or they could be for something completely different (bigBed allows for optional fields that could contain anything, but must be defined by an autoSQL file during creation).

            For now I will parse the 12 defined bed fields and all other fields will be added as optional.

            Show
            nfreese Nowlan Freese added a comment - - edited The newest version of the BBFileReader.java from the IGV github repo has the following method: public String getAutoSql() { return autoSql; } This should get the AutoSQL custom BigBed fields. These would allow us to set the field names for IGB appropriately. Unfortunately the version of BBFileReader.java that IGB currently uses is missing the getAutoSql() method and there does not appear to be a way to get the BigBed field names. We can still access the optional fields, we just don't know what the field name is. For example, we cannot determine if a bigBed file with 12 fields plus 2 optional fields is a bed detail file, as the 2 optional fields could be for ID and Description (i.e. bed detail) or they could be for something completely different (bigBed allows for optional fields that could contain anything, but must be defined by an autoSQL file during creation). For now I will parse the 12 defined bed fields and all other fields will be added as optional.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Commit: https://bitbucket.org/nfreese/nowlanfork-igb/commits/6f00b4b39c0b72892601603ca82f7b5f18a2bde9
            Downloads folder: https://bitbucket.org/nfreese/nowlanfork-igb/downloads/

            Test files: https://data.cyverse.org/dav-anon/iplant/home/nfreese/2978_testing

            To test:
            Download 2978 branch installer and install IGB.
            Open the A_thaliana_Jun_2009 genome.
            Add the A_thaliana_Jun_2009_Araport11_bed12_2.bb test file to IGB and click Load Data.
            Add the A_thaliana_Jun_2009_Araport11_bed12_2.bed test file to IGB and click Load Data.
            Navigate to Chr1:2,257,253-2,260,326
            Compare the two files. The exons and introns should be the same (note that there may be differences in color/arrows).

            Compare the two additional bigBed (bb) files in the test files folder. The file names include the correct genome they should be loaded in and the chromosome where data can be found. Select a gene model and visually compare between the .bb and .bed files.

            Check that there are no errors/warnings appearing in the IGB log.

            Show
            nfreese Nowlan Freese added a comment - - edited Commit: https://bitbucket.org/nfreese/nowlanfork-igb/commits/6f00b4b39c0b72892601603ca82f7b5f18a2bde9 Downloads folder: https://bitbucket.org/nfreese/nowlanfork-igb/downloads/ Test files: https://data.cyverse.org/dav-anon/iplant/home/nfreese/2978_testing To test: Download 2978 branch installer and install IGB. Open the A_thaliana_Jun_2009 genome. Add the A_thaliana_Jun_2009_Araport11_bed12_2.bb test file to IGB and click Load Data. Add the A_thaliana_Jun_2009_Araport11_bed12_2.bed test file to IGB and click Load Data. Navigate to Chr1:2,257,253-2,260,326 Compare the two files. The exons and introns should be the same (note that there may be differences in color/arrows). Compare the two additional bigBed (bb) files in the test files folder. The file names include the correct genome they should be loaded in and the chromosome where data can be found. Select a gene model and visually compare between the .bb and .bed files. Check that there are no errors/warnings appearing in the IGB log.
            Hide
            omarne Omkar Marne (Inactive) added a comment - - edited

            I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected.

            There are errors or warnings in the log.

            Ticket is ready for the pull request.

            Show
            omarne Omkar Marne (Inactive) added a comment - - edited I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. There are errors or warnings in the log. Ticket is ready for the pull request.
            Show
            nfreese Nowlan Freese added a comment - Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/892
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Merged. Building master branch installers. Master branch installers built and also deployed on bioviz main in "early access igb" section.

            Ready for testing.

            Show
            ann.loraine Ann Loraine added a comment - - edited Merged. Building master branch installers. Master branch installers built and also deployed on bioviz main in "early access igb" section. Ready for testing.
            Hide
            omarne Omkar Marne (Inactive) added a comment - - edited

            I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. The exons and introns are same.

            There are errors or warnings in the log.

            Closing the ticket.

            Show
            omarne Omkar Marne (Inactive) added a comment - - edited I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. The exons and introns are same. There are errors or warnings in the log. Closing the ticket.
            Show
            nfreese Nowlan Freese added a comment - If you are looking for example .as (autosql) files check out the following links: https://github.com/ucscGenomeBrowser/kent/tree/e01be94b2df0b6b467170df7e304ed87493317bd/src/hg/lib https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/lib/

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: