Details

    • Type: Story
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      12
    • Sprint:
      Summer 2019 Sprint 12, Fall 2019 Sprint 1, Fall 2019 Sprint 2, Fall 2019 Sprint 3, Fall 4 : 30 Sep to 11 Oct, Fall 5 : 14 Oct to 25 Oct, Fall 6 : 28 Oct to 8 Nov

      Description

      In genomics we deal with very large files that associate numbers or features with genomic ranges.

      Often these files get large - 10 Gb or bigger - with many millions of features.

      Getting useful overviews of all these data is challenging. Current state of the art are so-called "coverage graphs" that plot the number of features per genomic base and display them as graphs in genome browser tracks. Some genome browsers (like IGB) can calculate these graphs "on the fly" after loading alignments (BAM files) into memory.

      Large genome-based files are sorted by sequence name and genomic position and then indexed to facilitate efficient loading of data from specific regions of the genome into genome browsers. For example BAM files have indexes called BAI files, which map locations in the genome to locations in the file. Tab-delimited file formats like BED and GFF can have indexes called TBI files. Bigwig and Bigbed files formats have indexes written at the top of the file.

      Back in 2011, Dr. Loraine had the idea to use the index files themselves as a summary of the larger file to give users an overview of the distribution of data in the larger file.

      At the time, we were collaborating with Michael Lawrence of Genentech. We did a little work on it, but eventually dropped it in favor of other things.

      Recently, a new paper was published from Aaron Quinlan's group that developed this idea further. Dr. Loraine saw a tweet about the article, recalled that she had thought of something similar many years before, and contacted Michael about it.

      He then pointed her to some code he and his Bioconductor collaborators had implemented that also explored the idea.

      I think it's time to re-visit this idea because in the interim, whole genome sequencing as a way to diagnose genetic problems has become much more practical. Let's jump back on it and see where it takes us!

      References:

      Goal:

      • Using one of the above tools, create BED or bigwig file from a BAI file and visualize in IGB

      Compare to ordinary coverage graph:

      • Performance - is it faster to load, less memory-intensive?
      • Do we notice any new patterns not previously apparent?

      Investigate:

      • Can we implement a new file parser for bai index files in IGB?

        Attachments

        1. BS-seq_Chr1.bam.bai
          88 kB
        2. BS-seq_Chr1-HEADERONLY.bam
          0.3 kB
        3. comparison.png
          comparison.png
          115 kB
        4. empty.bam
          0.1 kB
        5. empty.bam.bai
          0.0 kB
        6. estimateCoverage.bedgraph
          182 kB
        7. estimateCoverageScript.R
          1 kB
        8. galaxy.bai
          88 kB
        9. galaxy.bam
          0.3 kB
        10. galaxy.zip
          79 kB
        11. HG003.GRCh38.2x250.bam.bai
          8.96 MB
        12. indexcov.bedgraph
          50 kB
        13. indexcov.png
          indexcov.png
          15 kB
        14. jvarkit.bedgraph
          52 kB

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment -
            Show
            nfreese Nowlan Freese added a comment - Example bam file and index: http://igbquickload.org/smokeTestingQuickload/H_sapiens_Dec_2013/Bam/
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            Pawan Bole

            Indexcov installation steps: (Tested for Windows and Linux)

            1. Install Go: https://www.freecodecamp.org/news/setting-up-go-programming-language-on-windows-f02c8c14e2f/ (follow Linux tutorial for installing go)

            2. Set All the environment variables

            3. Test installation using: go version

            4. Download Goleft
            go get -u github.com/brentp/goleft/...
            (Source: https://github.com/brentp/goleft)
            5. Install Golegt using go
            go install github.com/brentp/goleft/cmd/goleft

            6. Run Indexcov to get the index coverage
            %GOPATH%/bin/goleft indexcov -d output/ path/to/*.bam

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited Pawan Bole Indexcov installation steps: (Tested for Windows and Linux) 1. Install Go: https://www.freecodecamp.org/news/setting-up-go-programming-language-on-windows-f02c8c14e2f/ (follow Linux tutorial for installing go) 2. Set All the environment variables 3. Test installation using: go version 4. Download Goleft go get -u github.com/brentp/goleft/... (Source: https://github.com/brentp/goleft ) 5. Install Golegt using go go install github.com/brentp/goleft/cmd/goleft 6. Run Indexcov to get the index coverage %GOPATH%/bin/goleft indexcov -d output/ path/to/*.bam
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            Pawan Bole
            Tested Example:

            1. Download the BAM file(Suggested by Dr Nowlan): https://usegalaxy.org/datasets/bbd44e69cb8906b52300b547db6d5d14/display?to_ext=bam
            2. Download Bai file: https://usegalaxy.org/dataset/get_metadata_file?hda_id=bbd44e69cb8906b52300b547db6d5d14&metadata_name=bam_index
            3. Put these two files in the same folder
            4. Execute below command
            %GOPATH%/bin/goleft indexcov -d output_folder/ Galaxy4.bam
            5. Output files are generated. (Output folder is attached.)

            (Note: We take BAM file to get the BAM index headers, but rest all work is done using BAI file. We can covert huge BAM to very small BAM using samtools. (Contact Dr Nowlan regarding this conversion)).

            (Note: We can also do this using https://github.com/brentp/goleft/tree/master/indexcov/anonymize tool.)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited Pawan Bole Tested Example: 1. Download the BAM file(Suggested by Dr Nowlan): https://usegalaxy.org/datasets/bbd44e69cb8906b52300b547db6d5d14/display?to_ext=bam 2. Download Bai file: https://usegalaxy.org/dataset/get_metadata_file?hda_id=bbd44e69cb8906b52300b547db6d5d14&metadata_name=bam_index 3. Put these two files in the same folder 4. Execute below command %GOPATH%/bin/goleft indexcov -d output_folder/ Galaxy4.bam 5. Output files are generated. (Output folder is attached.) (Note: We take BAM file to get the BAM index headers, but rest all work is done using BAI file. We can covert huge BAM to very small BAM using samtools. (Contact Dr Nowlan regarding this conversion)). (Note: We can also do this using https://github.com/brentp/goleft/tree/master/indexcov/anonymize tool.)
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            Output Folder. We just need .bed file in this. (check galazy.zip attached to this ticket)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited Output Folder. We just need .bed file in this. (check galazy.zip attached to this ticket)
            Hide
            ann.loraine Ann Loraine added a comment -

            I think I have found vestigial code in IGB that implements (implemented?) this:
            See:
            IndexZoomSymLoader.java

            also, this very old commit from 2012:

            https://bitbucket.org/lorainelab/integrated-genome-browser/commits/0dadf02ed0dc2aad21ba41deff508d878465b5f8#chg-core/genometryImpl/src/com/affymetrix/genometryImpl/symloader/IndexZoomSymLoader.java

            Show
            ann.loraine Ann Loraine added a comment - I think I have found vestigial code in IGB that implements (implemented?) this: See: IndexZoomSymLoader.java also, this very old commit from 2012: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/0dadf02ed0dc2aad21ba41deff508d878465b5f8#chg-core/genometryImpl/src/com/affymetrix/genometryImpl/symloader/IndexZoomSymLoader.java
            Hide
            nfreese Nowlan Freese added a comment -

            I added an R script for running the BioVizBase package estimateCoverage method. Pass the R script a bam file (must have bam index in same directory) and it should output a bedgraph of the coverage.

            The output from estimateCoverage looks identical to the indexCov output with one difference (see attachments estimateCoverage.bedgraph vs indexcov.bedgraph). The estimateCoverage is in the millions of counts, whereas the indexCov output looks like a percent.

            Show
            nfreese Nowlan Freese added a comment - I added an R script for running the BioVizBase package estimateCoverage method. Pass the R script a bam file (must have bam index in same directory) and it should output a bedgraph of the coverage. The output from estimateCoverage looks identical to the indexCov output with one difference (see attachments estimateCoverage.bedgraph vs indexcov.bedgraph). The estimateCoverage is in the millions of counts, whereas the indexCov output looks like a percent.
            Hide
            ann.loraine Ann Loraine added a comment -

            Quick followup:

            Does the percentage have to get the total number of aligned reads from the BAM file?

            Show
            ann.loraine Ann Loraine added a comment - Quick followup: Does the percentage have to get the total number of aligned reads from the BAM file?
            Hide
            nfreese Nowlan Freese added a comment -

            Pierre Lindenbaum has a post on turning bai into xml that may be useful: https://www.biostars.org/p/172515/

            and the code: http://lindenb.github.io/jvarkit/Biostar172515.html

            Show
            nfreese Nowlan Freese added a comment - Pierre Lindenbaum has a post on turning bai into xml that may be useful: https://www.biostars.org/p/172515/ and the code: http://lindenb.github.io/jvarkit/Biostar172515.html
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Now I get what Sai Charan Reddy Vallapureddy was mentioning about using the "samtools" package – which is part of htsjdk.
            It looks to me like the "Bin" object in Pierre L.'s code represents a region on the genomic sequence. Each "Bin" appears to have a read count associated with it. I think this number roughly corresponds to y-axis value in a GraphSym.

            Show
            ann.loraine Ann Loraine added a comment - - edited Now I get what Sai Charan Reddy Vallapureddy was mentioning about using the "samtools" package – which is part of htsjdk. It looks to me like the "Bin" object in Pierre L.'s code represents a region on the genomic sequence. Each "Bin" appears to have a read count associated with it. I think this number roughly corresponds to y-axis value in a GraphSym.
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Nowlan Freese

            Thanks, Dr Nowlan, I used the reference code (Pierra Lindenbaum) and able to parse the BAI file. I am getting staring, ending and offset.
            I think the next step is to pass it to graphsym.

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Nowlan Freese Thanks, Dr Nowlan, I used the reference code (Pierra Lindenbaum) and able to parse the BAI file. I am getting staring, ending and offset. I think the next step is to pass it to graphsym.
            Hide
            nfreese Nowlan Freese added a comment -

            The output from jvarkit on a bai file has the following lines for each chromosome:

            <bin first-locus="1" last-locus="16384" level="5" first-offset="20709376" n_chunk="24">
            <bin first-locus="16385" last-locus="32768" level="5" first-offset="81570668173" n_chunk="23">

            The output continues in these 16,384 bp sections until the end of the chromosome.
            The "offset" has something to do with the number of bytes of data (in this case reads) that are within that region of the genome. If you subtract the offsets (in this example: 81570668173 - 20709376 = 81549958797) you get the "size" of each 16,384 bin.

            I did this for the example bai file, and the output matches exactly with indexcov and estimateCoverage (see attached image).

            Show
            nfreese Nowlan Freese added a comment - The output from jvarkit on a bai file has the following lines for each chromosome: <bin first-locus="1" last-locus="16384" level="5" first-offset="20709376" n_chunk="24"> <bin first-locus="16385" last-locus="32768" level="5" first-offset="81570668173" n_chunk="23"> The output continues in these 16,384 bp sections until the end of the chromosome. The "offset" has something to do with the number of bytes of data (in this case reads) that are within that region of the genome. If you subtract the offsets (in this example: 81570668173 - 20709376 = 81549958797) you get the "size" of each 16,384 bin. I did this for the example bai file, and the output matches exactly with indexcov and estimateCoverage (see attached image).
            Hide
            nfreese Nowlan Freese added a comment -

            Some comments regarding our recent meeting:

            From the indexcov paper: "the median of these proxy coverage values is used to establish a baseline coverage level for the average tile per chromosome" From their github "Since we know the total number of 16,384 base intervals in the index and the size of the bam file (from the last file offset stored in the index), we know the average size (in bytes) taken by each 16,384 base chunk. So, we iterate over each (16KB) element in the linear index, subtract the previous file offset, and scale by the expected (average) size. This gives the scaled value for each 16,384-base chunk." - It looks like they scale the y-axis values based on either the median or average bin depth/offset to get a value that should be close to 1. That way if there's a duplication event, you would expect values around 2. I think this makes the most sense.

            I can't find anywhere in the paper or on their github as to how they handle the final 16KB bin, but looking at their bedgraph file it appears to be ignoring the last bin, since there does not seem to be a way to calculate its size. The estimateCoveage program also ignores the last bin. This leads me to believe that it may not be possible to calculate the size of the last bin. One caveat is that it may be possible to use the offset of the first bin on the next chromosome. Our current test bai file only has data for chromosome one, so the last bin is the last bin for both the chromosome and the file.

            Show
            nfreese Nowlan Freese added a comment - Some comments regarding our recent meeting: From the indexcov paper: "the median of these proxy coverage values is used to establish a baseline coverage level for the average tile per chromosome" From their github "Since we know the total number of 16,384 base intervals in the index and the size of the bam file (from the last file offset stored in the index), we know the average size (in bytes) taken by each 16,384 base chunk. So, we iterate over each (16KB) element in the linear index, subtract the previous file offset, and scale by the expected (average) size. This gives the scaled value for each 16,384-base chunk." - It looks like they scale the y-axis values based on either the median or average bin depth/offset to get a value that should be close to 1. That way if there's a duplication event, you would expect values around 2. I think this makes the most sense. I can't find anywhere in the paper or on their github as to how they handle the final 16KB bin, but looking at their bedgraph file it appears to be ignoring the last bin, since there does not seem to be a way to calculate its size. The estimateCoveage program also ignores the last bin. This leads me to believe that it may not be possible to calculate the size of the last bin. One caveat is that it may be possible to use the offset of the first bin on the next chromosome. Our current test bai file only has data for chromosome one, so the last bin is the last bin for both the chromosome and the file.
            Hide
            ann.loraine Ann Loraine added a comment -

            Sai Charan Reddy Vallapureddy Could you add a link to the branch and fork you are using for development? Thank you!

            Show
            ann.loraine Ann Loraine added a comment - Sai Charan Reddy Vallapureddy Could you add a link to the branch and fork you are using for development? Thank you!
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            [~aloraine]
            Branch: https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920

            Implemented Tasks:
            1. IGB now accepts bai file format
            2. IGB takes chromosomes lengths from genome.txt instead of bam file
            3. IGB parses bai file using samreader and generates new bedgraph file
            4. Newly created bedgraph is automatically loaded into the IGB. (it shows results)

            TODO:
            1. Reading baifile requires samreader. Samreader is tighly coupled with bamfile. Trying to remove bamfile dependency in the logic(But chromosomes lengths are taken from genome.txt not from bam file)
            2. Should not create new bedgraph file. Should deal with it internally.

            Test steps:
            1. Clone my branch
            2. Open IGB
            3. Place attached galaxy.bai and galaxy.bam files in the same folder.
            4. Select A. Thaliana (flower)
            5. Drag and drop bai file into IGB
            6. Click Load Sequence

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited [~aloraine] Branch: https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920 Implemented Tasks: 1. IGB now accepts bai file format 2. IGB takes chromosomes lengths from genome.txt instead of bam file 3. IGB parses bai file using samreader and generates new bedgraph file 4. Newly created bedgraph is automatically loaded into the IGB. (it shows results) TODO: 1. Reading baifile requires samreader. Samreader is tighly coupled with bamfile. Trying to remove bamfile dependency in the logic(But chromosomes lengths are taken from genome.txt not from bam file) 2. Should not create new bedgraph file. Should deal with it internally. Test steps: 1. Clone my branch 2. Open IGB 3. Place attached galaxy.bai and galaxy.bam files in the same folder. 4. Select A. Thaliana (flower) 5. Drag and drop bai file into IGB 6. Click Load Sequence
            Hide
            ann.loraine Ann Loraine added a comment -

            Notes:

            • IGB is reading the 'bai' file using htsdjk
            • However, the htsjdk does not support reading just the bai file - it needs the BAM file

            To-do:

            • Investigate how IGB Quickload is able to provide a .bai location independent of the .bam file (should test that it still works!)

            Some thoughts: I think (Ann) we should strip out the code from the htsjdk that we need and re-factor it so that it's more flexible and can do what we want. (Not sure how much of a spaghetti mess that code is.)

            Show
            ann.loraine Ann Loraine added a comment - Notes: IGB is reading the 'bai' file using htsdjk However, the htsjdk does not support reading just the bai file - it needs the BAM file To-do: Investigate how IGB Quickload is able to provide a .bai location independent of the .bam file (should test that it still works!) Some thoughts: I think (Ann) we should strip out the code from the htsjdk that we need and re-factor it so that it's more flexible and can do what we want. (Not sure how much of a spaghetti mess that code is.)
            Hide
            ann.loraine Ann Loraine added a comment -

            Note: All other tools that work with .bai file appear to need the .bam file too. We think this could be because they are all using samtools and/or htsjdk.

            Show
            ann.loraine Ann Loraine added a comment - Note: All other tools that work with .bai file appear to need the .bam file too. We think this could be because they are all using samtools and/or htsjdk.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Added large bai file to test.

            File from: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG003_NA24149_father/NIST_Illumina_2x250bps/novoalign_bams/

            Aligned to: 2013 Human genome GRCh38 (hg38)

            Show
            nfreese Nowlan Freese added a comment - - edited Added large bai file to test. File from: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG003_NA24149_father/NIST_Illumina_2x250bps/novoalign_bams/ Aligned to: 2013 Human genome GRCh38 (hg38)
            Hide
            nfreese Nowlan Freese added a comment -

            Regarding normalization of data: looking at estimateCoverage in biovizBase, I'm not 100% sure how they are normalizing their data. If I take the score column per chromosome and get the average, each chromosome comes out to nearly the exact same average (ratios close to 1). If I take our file from IGB and do the same thing, the chromosomes have different values (ratios around 1.25). It looks like estimateCoverage may be normalizing the values per chromosome. I don't think this is the best way, as it would make it more difficult to compare between chromosomes. I think the best way would be to find the average bin size across all of the chromosomes, and then divide all of the values by that number. This should put most of the values around 1.

            Normalizing the values down to a smaller number (close to 1) may be required in IGB. I'm seeing some odd behavior, where the 6mb file with large bedgraph values (in the billions) is taking up 2gb of memory in IGB. The same file but with small values (in the hundreds) takes up 100mb of memory. Tested in 9.0.2 and 9.1.0.

            Show
            nfreese Nowlan Freese added a comment - Regarding normalization of data: looking at estimateCoverage in biovizBase, I'm not 100% sure how they are normalizing their data. If I take the score column per chromosome and get the average, each chromosome comes out to nearly the exact same average (ratios close to 1). If I take our file from IGB and do the same thing, the chromosomes have different values (ratios around 1.25). It looks like estimateCoverage may be normalizing the values per chromosome. I don't think this is the best way, as it would make it more difficult to compare between chromosomes. I think the best way would be to find the average bin size across all of the chromosomes, and then divide all of the values by that number. This should put most of the values around 1. Normalizing the values down to a smaller number (close to 1) may be required in IGB. I'm seeing some odd behavior, where the 6mb file with large bedgraph values (in the billions) is taking up 2gb of memory in IGB. The same file but with small values (in the hundreds) takes up 100mb of memory. Tested in 9.0.2 and 9.1.0.
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            Nowlan Freese [~aloraine]
            BAM file is no longer needed to load BAI file with the below changes. To remove the dependency changes are made in htsjdk project.

            Branches:
            https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI

            https://github.com/VallapuCharan/htsjdk-igb/commits/igb-2.16.3
            (Don't use master, use igb-2.16.2)

            Steps to build with new changes.
            1. Clone my htsjdk branch.
            2. Gradle clean and install.
            3. Replace old build(2.16.2) with the new one (2.16.3) in .m2 repository(htsjdk).
            4. Clone my IGB branch.
            5. Maven clean and install.
            6. Run and test it

            Test steps:
            1. Select A. Thaliana (flower)
            2. Drag and drop attached galaxy.bai file into IGB
            3. Click the Load Sequence.
            4. You should see a graph.

            Todo:
            1. Bedgraph file is created to get the graph. (Discussion required)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited Nowlan Freese [~aloraine] BAM file is no longer needed to load BAI file with the below changes. To remove the dependency changes are made in htsjdk project. Branches: https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI https://github.com/VallapuCharan/htsjdk-igb/commits/igb-2.16.3 (Don't use master, use igb-2.16.2) Steps to build with new changes. 1. Clone my htsjdk branch. 2. Gradle clean and install. 3. Replace old build(2.16.2) with the new one (2.16.3) in .m2 repository(htsjdk). 4. Clone my IGB branch. 5. Maven clean and install. 6. Run and test it Test steps: 1. Select A. Thaliana (flower) 2. Drag and drop attached galaxy.bai file into IGB 3. Click the Load Sequence. 4. You should see a graph. Todo: 1. Bedgraph file is created to get the graph. (Discussion required)
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited

            [~aloraine] Nowlan Freese

            Branches:
            IGB:
            IGB (Sync with the master. I squashed all my commits to single commit)
            https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI

            Htsjsk:
            htsjdk-igb (2.16.3) (It is now in sync with upstream igb-2.16.3. Pull request is merged.)
            https://github.com/VallapuCharan/htsjdk-igb/commits/igb-2.16.3

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - - edited [~aloraine] Nowlan Freese Branches: IGB: IGB (Sync with the master. I squashed all my commits to single commit) https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI Htsjsk: htsjdk-igb (2.16.3) (It is now in sync with upstream igb-2.16.3. Pull request is merged.) https://github.com/VallapuCharan/htsjdk-igb/commits/igb-2.16.3
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            [~aloraine]
            Pull Request is submitted. (Created a new branch with new comments and new method names. All your comments are addressed.)

            Branch: https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI

            (Just FYI: your comments are in different branch: https://bitbucket.org/svallapu/charan_igb/commits/2f02b6e76790a66bc438f4649c8ea4a9aad061bd?at=IGBF-1920-Final)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - [~aloraine] Pull Request is submitted. (Created a new branch with new comments and new method names. All your comments are addressed.) Branch: https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1920-BAI (Just FYI: your comments are in different branch: https://bitbucket.org/svallapu/charan_igb/commits/2f02b6e76790a66bc438f4649c8ea4a9aad061bd?at=IGBF-1920-Final )
            Hide
            nfreese Nowlan Freese added a comment -

            Tested using [jar | HG002.GRCh38.2x250.bam.bai] from loraine lab bitbucket.

            Able to load HG002.GRCh38.2x250.bam.bai. Produced a bedgraph file that was then automatically loaded into IGB.

            Closing issue

            Note: Additional issues have been created to fix the scaling of the data and loading directly from the bai (instead of bedgraph).

            Show
            nfreese Nowlan Freese added a comment - Tested using [jar | HG002.GRCh38.2x250.bam.bai] from loraine lab bitbucket. Able to load HG002.GRCh38.2x250.bam.bai. Produced a bedgraph file that was then automatically loaded into IGB. Closing issue Note: Additional issues have been created to fix the scaling of the data and loading directly from the bai (instead of bedgraph).
            Hide
            nfreese Nowlan Freese added a comment -

            Note: I was interested in why we were getting negative values from some of the offsets, so I looked at the raw offset data generated by jvarkit. The negative values are occurring when an offset is less than the offset before it. It's unclear why an offset would ever be negative (and not zero), however, looking at the bam file, it appears that negative and zero value offsets tend to occur in areas where there is no coverage. This would explain why any time we see negative values in our data, they are zeros in the same locations in indexcov or estimateCoverage.

            Show
            nfreese Nowlan Freese added a comment - Note: I was interested in why we were getting negative values from some of the offsets, so I looked at the raw offset data generated by jvarkit. The negative values are occurring when an offset is less than the offset before it. It's unclear why an offset would ever be negative (and not zero), however, looking at the bam file, it appears that negative and zero value offsets tend to occur in areas where there is no coverage. This would explain why any time we see negative values in our data, they are zeros in the same locations in indexcov or estimateCoverage.

              People

              • Assignee:
                svallapu Sai Charan Reddy Vallapureddy (Inactive)
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: