Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1469

Investigate supporting CRAM files

    Details

    • Story Points:
      3
    • Sprint:
      Winter 2018 Sprint 3, Summer 2019 Sprint 8, Fall 2 2023 Sep 17, Fall 4 2023 Oct 16

      Description

      Discussion on twitter:

      https://twitter.com/pjacock/status/500208907651153920

      CRAM is a compressed, random-access file format used instead of BAM. It may be better. Not yet widely used but may be in future. Supporting CRAM could benefit users.

        Attachments

        1. A_thaliana_Jun_2009_Chr1.bam
          143 kB
          Nowlan Freese
        2. A_thaliana_Jun_2009_Chr1.bam.bai
          0.1 kB
          Nowlan Freese
        3. A_thaliana_Jun_2009_Chr1.cram
          78 kB
          Nowlan Freese
        4. A_thaliana_Jun_2009_Chr1.cram.crai
          0.0 kB
          Nowlan Freese
        5. A_thaliana_Jun_2009_Chr1.sam
          803 kB
          Nowlan Freese

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Probably this is a low-ish priority as CRAM doesn't seem to be gaining much traction. Other things are probably a lot higher priority.

            Show
            ann.loraine Ann Loraine added a comment - Probably this is a low-ish priority as CRAM doesn't seem to be gaining much traction. Other things are probably a lot higher priority.
            Hide
            nfreese Nowlan Freese added a comment -

            The Genome in a Bottle project has ultralong Oxford Nanopore alignments in cram format.

            Show
            nfreese Nowlan Freese added a comment - The Genome in a Bottle project has ultralong Oxford Nanopore alignments in cram format.
            Hide
            ann.loraine Ann Loraine added a comment -

            Task:

            • Investigate and understand the format - make notes here
            • Investigate tooling in htsjdk library to determine if parsing code is available there
            Show
            ann.loraine Ann Loraine added a comment - Task: Investigate and understand the format - make notes here Investigate tooling in htsjdk library to determine if parsing code is available there
            Hide
            ann.loraine Ann Loraine added a comment -
            Show
            ann.loraine Ann Loraine added a comment - Blog post about CRAM - https://brentp.github.io/post/cram-speed/
            Show
            ann.loraine Ann Loraine added a comment - Also see: https://www.ga4gh.org/news/cram4gh-twitter-chat-recap/
            Hide
            ann.loraine Ann Loraine added a comment -

            Nowlan Freese : Consumer genomics companies are using CRAM to distribute data to customers.
            [~aloraine] : Situation has changed. We now have even better reasons to support CRAM natively.

            Show
            ann.loraine Ann Loraine added a comment - Nowlan Freese : Consumer genomics companies are using CRAM to distribute data to customers. [~aloraine] : Situation has changed. We now have even better reasons to support CRAM natively.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            I have attached example files for bam (and bai index), sam, and cram (and crai index). Files were created from the same original bam file using samtools - http://lorainelab-quickload.scidas.org/rnaseq/A_thaliana_Jun_2009/auxin_arf19/Col.C.bam. File contains RNA-Seq data from Arabidopsis thaliana (A_thaliana_Jun_2009) within Chr1:6,689-8,835

            Show
            nfreese Nowlan Freese added a comment - - edited I have attached example files for bam (and bai index), sam, and cram (and crai index). Files were created from the same original bam file using samtools - http://lorainelab-quickload.scidas.org/rnaseq/A_thaliana_Jun_2009/auxin_arf19/Col.C.bam . File contains RNA-Seq data from Arabidopsis thaliana (A_thaliana_Jun_2009) within Chr1:6,689-8,835
            Show
            kgopu Kaushik Gopu added a comment - - edited Blogs I am referring to understand various mapped sequence data formats and their differences: https://gatk.broadinstitute.org/hc/en-us/articles/360035890791-SAM-or-BAM-or-CRAM-Mapped-sequence-data-formats#:~:text=SAM%20stands%20for%20Sequence%20Alignment,up%20to%20very%20much%20indeed ).
            Hide
            ann.loraine Ann Loraine added a comment -

            Conclusion: Decided to proceed with supporting CRAM in IGB

            Show
            ann.loraine Ann Loraine added a comment - Conclusion: Decided to proceed with supporting CRAM in IGB

              People

              • Assignee:
                kgopu Kaushik Gopu
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: