Details

    • Story Points:
      4
    • Sprint:
      Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10, Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14

      Description

      Situation: bigBed is an indexed binary format of bed files. However, in IGB, bigBed files are not appearing correctly. The bigBed gene annotations (see attached file) are not showing the introns/exons, but instead are drawing a single glyph for the entire gene.

      Task: Fix IGB so that the bigBed files appear correctly.

      See the UCSC guide on bigBed for more information on the bigBed file format.

        Attachments

        1. Araport11.bb
          3.14 MB
        2. bbVSbed.png
          bbVSbed.png
          149 kB
        3. bed.png
          bed.png
          195 kB
        4. bed and bb.png
          bed and bb.png
          80 kB
        5. bedDetail.as
          0.9 kB
        6. chrom.sizes
          0.1 kB

          Issue Links

            Activity

            nfreese Nowlan Freese created issue -
            nfreese Nowlan Freese made changes -
            Field Original Value New Value
            Epic Link IGBF-1765 [ 17855 ]
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-2954 [ IGBF-2954 ]
            nfreese Nowlan Freese made changes -
            Attachment bbVSbed.png [ 16793 ]
            nfreese Nowlan Freese made changes -
            Attachment Araport11.bb [ 16794 ]
            Hide
            nfreese Nowlan Freese added a comment -

            To reproduce the issue:

            Open IGB
            Open the Arabidopsis thaliana A_thaliana_Jun_2009 genome
            Navigate to Chr1:2,257,253-2,260,326
            Load the attached Araport11.bb file
            Click Load Data and compare the Araport11.bb file and the Araport11 bed file found in IGB Quickload (should load by default)

            Show
            nfreese Nowlan Freese added a comment - To reproduce the issue: Open IGB Open the Arabidopsis thaliana A_thaliana_Jun_2009 genome Navigate to Chr1:2,257,253-2,260,326 Load the attached Araport11.bb file Click Load Data and compare the Araport11.bb file and the Araport11 bed file found in IGB Quickload (should load by default)
            Hide
            nfreese Nowlan Freese added a comment - - edited

            As a sanity check, I viewed the Araport11.bb file in IGV (2.11.1) and it appeared correctly.

            I also converted the Araport11.bb file back into bed file and that bed file appeared correctly when viewed in IGB 9.1.8.

            I also checked previous versions of IGB going back to 8.2.0 and bigBed did not appear correctly in any of them.

            Show
            nfreese Nowlan Freese added a comment - - edited As a sanity check, I viewed the Araport11.bb file in IGV (2.11.1) and it appeared correctly. I also converted the Araport11.bb file back into bed file and that bed file appeared correctly when viewed in IGB 9.1.8. I also checked previous versions of IGB going back to 8.2.0 and bigBed did not appear correctly in any of them.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Here is the command I used to convert the Araport11.bed file found in the IGB Quickload to a bigBed file using the UCSC bedToBigBed converter:

            bedToBigBed -as=bedDetail.as -type=bed12+2 -tab Araport11.bed chrom.sizes Araport11.bb

            Note that the Araport11.bed file from IGB Quickload is a bed detail file and contains 14 columns. bigBed can by default handle 12 columns by specifying the -type=bed12 option. For a 14 column bedDetail file, -type=bed12+2 indicates two additional columns, which then must be defined in a .as file (see attached).

            Show
            nfreese Nowlan Freese added a comment - - edited Here is the command I used to convert the Araport11.bed file found in the IGB Quickload to a bigBed file using the UCSC bedToBigBed converter: bedToBigBed -as=bedDetail.as -type=bed12+2 -tab Araport11.bed chrom.sizes Araport11.bb Note that the Araport11.bed file from IGB Quickload is a bed detail file and contains 14 columns. bigBed can by default handle 12 columns by specifying the -type=bed12 option. For a 14 column bedDetail file, -type=bed12+2 indicates two additional columns, which then must be defined in a .as file (see attached).
            nfreese Nowlan Freese made changes -
            Attachment bedDetail.as [ 16795 ]
            nfreese Nowlan Freese made changes -
            Description Situation: bigBed is an indexed binary format of bed files. However, in IGB, bigBed files are not appearing correctly. The bigBed gene annotations (see attached file) are not showing the introns/exons, but instead are drawing a single glyph for the entire gene.

            Task: Fix IGB so that the bigBed files appear correctly.

            See the [UCSC guide on bigBed|] for more information on the bigBed file format.
            Situation: bigBed is an indexed binary format of bed files. However, in IGB, bigBed files are not appearing correctly. The bigBed gene annotations (see attached file) are not showing the introns/exons, but instead are drawing a single glyph for the entire gene.

            Task: Fix IGB so that the bigBed files appear correctly.

            See the [UCSC guide on bigBed|https://genome.ucsc.edu/goldenPath/help/bigBed.html] for more information on the bigBed file format.
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-2979 [ IGBF-2979 ]
            Hide
            nfreese Nowlan Freese added a comment -

            Much of the logic for identifying chromosomes is from BBFileReader.java provided by Broad. It appears we may be using an older version of this file.

            Show
            nfreese Nowlan Freese added a comment - Much of the logic for identifying chromosomes is from BBFileReader.java provided by Broad. It appears we may be using an older version of this file.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            ann.loraine Ann Loraine made changes -
            Sprint Fall 6 2021 Oct 25 - Nov 5 [ 132 ] Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 19 [ 132, 133 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked lower
            ann.loraine Ann Loraine made changes -
            Sprint Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24 [ 132, 133 ] Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10 [ 132, 133, 134 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10 [ 132, 133, 134 ] Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10, Fall 9 2021 Dec 13 - Dec 24 [ 132, 133, 134, 135 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            nfreese Nowlan Freese added a comment -

            While investigating the BBFileReader.java I decided to double-check to see if the issue with IGB failing to properly load the bigBed files was due to the older BBFileReader.java code. Going back to older versions of IGV (Version 2.1.30 from 12/11/2012) I found that IGV was able to properly load the bigBed file. Investigation of the version of BBFileReader.java currently used in IGB showed that the returned values do include the full bed information (all fields of the bed file including introns/exons). So this may be a case of the data being returned from BBFileReader.java not being parsed correctly by the IGB code. I have shifted my focus to identify on the IGB codebase.

            Show
            nfreese Nowlan Freese added a comment - While investigating the BBFileReader.java I decided to double-check to see if the issue with IGB failing to properly load the bigBed files was due to the older BBFileReader.java code. Going back to older versions of IGV (Version 2.1.30 from 12/11/2012) I found that IGV was able to properly load the bigBed file. Investigation of the version of BBFileReader.java currently used in IGB showed that the returned values do include the full bed information (all fields of the bed file including introns/exons). So this may be a case of the data being returned from BBFileReader.java not being parsed correctly by the IGB code. I have shifted my focus to identify on the IGB codebase.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The problem is in BigBedSymLoader.java in the parse method. The parse method is creating a SimpleSymWithProps where only the chromosome, start, stop, and strand are being used. The remaining bed fields are labeled as restOfFields and effectively ignored by IGB. I will need to determine a better way to parse this.

            Show
            nfreese Nowlan Freese added a comment - - edited The problem is in BigBedSymLoader.java in the parse method. The parse method is creating a SimpleSymWithProps where only the chromosome, start, stop, and strand are being used. The remaining bed fields are labeled as restOfFields and effectively ignored by IGB. I will need to determine a better way to parse this.
            Hide
            nfreese Nowlan Freese added a comment -

            The restOfFields array contains all of the remaining fields past the first three of the bed file, including "optional" fields (will need to test).
            So logic should be to just build a bed file using the defined field count, then if there are additional fields (fieldCount - definedFieldCount, these would need to be added on the end).
            I'm not sure if IGB allows for something like a bed 6 file plus optional fields, will need to look into this.

            Show
            nfreese Nowlan Freese added a comment - The restOfFields array contains all of the remaining fields past the first three of the bed file, including "optional" fields (will need to test). So logic should be to just build a bed file using the defined field count, then if there are additional fields (fieldCount - definedFieldCount, these would need to be added on the end). I'm not sure if IGB allows for something like a bed 6 file plus optional fields, will need to look into this.
            Hide
            ann.loraine Ann Loraine added a comment -

            Scrum discussion:
            Also look at bedgraph format and how we use it in IGB.
            Suggestion: Confirm that "bigbed" is always used to represent region-based data, i.e., annotations, and never graphs.
            Q: Are bedgraph files also bed? A: No.
            NF: Needs to re-write our bed parsing code a bit to adapt it to the different requirements of bigbed, which allows optional fields following the require fields.

            Show
            ann.loraine Ann Loraine added a comment - Scrum discussion: Also look at bedgraph format and how we use it in IGB. Suggestion: Confirm that "bigbed" is always used to represent region-based data, i.e., annotations, and never graphs. Q: Are bedgraph files also bed? A: No. NF: Needs to re-write our bed parsing code a bit to adapt it to the different requirements of bigbed, which allows optional fields following the require fields.
            nfreese Nowlan Freese made changes -
            Story Points 3 4
            Hide
            nfreese Nowlan Freese added a comment -

            I have finished the initial parsing work. The Araport11.bb file will now show correctly in IGB.

            Next steps are to test different bigBed files and to clean up the code.

            Show
            nfreese Nowlan Freese added a comment - I have finished the initial parsing work. The Araport11.bb file will now show correctly in IGB. Next steps are to test different bigBed files and to clean up the code.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            While testing I noticed that special characters are not handled in bigBed file names.

            For example, this will not work:
            myBigBed2_hg18_bed9+2.bb

            but this will:
            myBigBed2_hg18_bed9_2.bb

            Bed files do seem to handle special characters, for example this file works:
            myBigBed2_hg18_bed9+2.bed

            Show
            nfreese Nowlan Freese added a comment - - edited While testing I noticed that special characters are not handled in bigBed file names. For example, this will not work: myBigBed2_hg18_bed9+2.bb but this will: myBigBed2_hg18_bed9_2.bb Bed files do seem to handle special characters, for example this file works: myBigBed2_hg18_bed9+2.bed
            Hide
            nfreese Nowlan Freese added a comment - - edited

            The newest version of the BBFileReader.java from the IGV github repo has the following method:

            public String getAutoSql()

            { return autoSql; }

            This should get the AutoSQL custom BigBed fields. These would allow us to set the field names for IGB appropriately.

            Unfortunately the version of BBFileReader.java that IGB currently uses is missing the getAutoSql() method and there does not appear to be a way to get the BigBed field names. We can still access the optional fields, we just don't know what the field name is. For example, we cannot determine if a bigBed file with 12 fields plus 2 optional fields is a bed detail file, as the 2 optional fields could be for ID and Description (i.e. bed detail) or they could be for something completely different (bigBed allows for optional fields that could contain anything, but must be defined by an autoSQL file during creation).

            For now I will parse the 12 defined bed fields and all other fields will be added as optional.

            Show
            nfreese Nowlan Freese added a comment - - edited The newest version of the BBFileReader.java from the IGV github repo has the following method: public String getAutoSql() { return autoSql; } This should get the AutoSQL custom BigBed fields. These would allow us to set the field names for IGB appropriately. Unfortunately the version of BBFileReader.java that IGB currently uses is missing the getAutoSql() method and there does not appear to be a way to get the BigBed field names. We can still access the optional fields, we just don't know what the field name is. For example, we cannot determine if a bigBed file with 12 fields plus 2 optional fields is a bed detail file, as the 2 optional fields could be for ID and Description (i.e. bed detail) or they could be for something completely different (bigBed allows for optional fields that could contain anything, but must be defined by an autoSQL file during creation). For now I will parse the 12 defined bed fields and all other fields will be added as optional.
            nfreese Nowlan Freese made changes -
            Summary Fix bigBed parsing Improve bigBed parsing
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Commit: https://bitbucket.org/nfreese/nowlanfork-igb/commits/6f00b4b39c0b72892601603ca82f7b5f18a2bde9
            Downloads folder: https://bitbucket.org/nfreese/nowlanfork-igb/downloads/

            Test files: https://data.cyverse.org/dav-anon/iplant/home/nfreese/2978_testing

            To test:
            Download 2978 branch installer and install IGB.
            Open the A_thaliana_Jun_2009 genome.
            Add the A_thaliana_Jun_2009_Araport11_bed12_2.bb test file to IGB and click Load Data.
            Add the A_thaliana_Jun_2009_Araport11_bed12_2.bed test file to IGB and click Load Data.
            Navigate to Chr1:2,257,253-2,260,326
            Compare the two files. The exons and introns should be the same (note that there may be differences in color/arrows).

            Compare the two additional bigBed (bb) files in the test files folder. The file names include the correct genome they should be loaded in and the chromosome where data can be found. Select a gene model and visually compare between the .bb and .bed files.

            Check that there are no errors/warnings appearing in the IGB log.

            Show
            nfreese Nowlan Freese added a comment - - edited Commit: https://bitbucket.org/nfreese/nowlanfork-igb/commits/6f00b4b39c0b72892601603ca82f7b5f18a2bde9 Downloads folder: https://bitbucket.org/nfreese/nowlanfork-igb/downloads/ Test files: https://data.cyverse.org/dav-anon/iplant/home/nfreese/2978_testing To test: Download 2978 branch installer and install IGB. Open the A_thaliana_Jun_2009 genome. Add the A_thaliana_Jun_2009_Araport11_bed12_2.bb test file to IGB and click Load Data. Add the A_thaliana_Jun_2009_Araport11_bed12_2.bed test file to IGB and click Load Data. Navigate to Chr1:2,257,253-2,260,326 Compare the two files. The exons and introns should be the same (note that there may be differences in color/arrows). Compare the two additional bigBed (bb) files in the test files folder. The file names include the correct genome they should be loaded in and the chromosome where data can be found. Select a gene model and visually compare between the .bb and .bed files. Check that there are no errors/warnings appearing in the IGB log.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Omkar Marne [ omarne ]
            nfreese Nowlan Freese made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            omarne Omkar Marne (Inactive) made changes -
            Attachment bed.png [ 17045 ]
            omarne Omkar Marne (Inactive) made changes -
            Attachment bed and bb.png [ 17046 ]
            Hide
            omarne Omkar Marne (Inactive) added a comment - - edited

            I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected.

            There are errors or warnings in the log.

            Ticket is ready for the pull request.

            Show
            omarne Omkar Marne (Inactive) added a comment - - edited I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. There are errors or warnings in the log. Ticket is ready for the pull request.
            omarne Omkar Marne (Inactive) made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            omarne Omkar Marne (Inactive) made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            nfreese Nowlan Freese made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            nfreese Nowlan Freese made changes -
            Assignee Omkar Marne [ omarne ]
            Show
            nfreese Nowlan Freese added a comment - Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/892
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Merged. Building master branch installers. Master branch installers built and also deployed on bioviz main in "early access igb" section.

            Ready for testing.

            Show
            ann.loraine Ann Loraine added a comment - - edited Merged. Building master branch installers. Master branch installers built and also deployed on bioviz main in "early access igb" section. Ready for testing.
            ann.loraine Ann Loraine made changes -
            Sprint Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10, Fall 9 2021 Dec 13 - Dec 24 [ 132, 133, 134, 135 ] Fall 6 2021 Oct 25 - Nov 5, Fall 7 2021 Nov 8 - Nov 24, Fall 8 2021 Nov 29 - Dec 10, Fall 9 2021 Dec 13 - Dec 24, Spring 1 2022 Jan 3 - Jan 14 [ 132, 133, 134, 135, 136 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Attachment chrom.sizes [ 17047 ]
            omarne Omkar Marne (Inactive) made changes -
            Assignee Omkar Marne [ omarne ]
            omarne Omkar Marne (Inactive) made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            Hide
            omarne Omkar Marne (Inactive) added a comment - - edited

            I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. The exons and introns are same.

            There are errors or warnings in the log.

            Closing the ticket.

            Show
            omarne Omkar Marne (Inactive) added a comment - - edited I installed 2978 branch installer and uploaded both the files mentioned above. The results are as expected. The exons and introns are same. There are errors or warnings in the log. Closing the ticket.
            omarne Omkar Marne (Inactive) made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            nfreese Nowlan Freese made changes -
            Assignee Omkar Marne [ omarne ] Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Fix Version/s 9.1.10 Major Release [ 10700 ]
            Show
            nfreese Nowlan Freese added a comment - If you are looking for example .as (autosql) files check out the following links: https://github.com/ucscGenomeBrowser/kent/tree/e01be94b2df0b6b467170df7e304ed87493317bd/src/hg/lib https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/lib/

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: