Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3841

Deploy new NEBULA data for human genome sequencing / CRAM demo

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      IGB can now open and display contents of CRAM files distributed by Nebula Genomics.

      AL has a recently downloaded CRAM file from Nebula that would be great to use, but it's huge - around 84 Gb.

      For this task, let's use a command line tool to extract the data for chromosome 1, the chromosome shows by default when it opens the latest human genome available in the browser - hg38.

      Let's deploy the actual data file on RENCI hosting and add it as a new data set in IGB Quickload main.

      Once it is there, we can then create a tutorial and also a video showing how to open and load the reads to look for or sanity-check polymorphisms. The tutorial ought to explain the soft clips and the SNPs - what they are and how to interpret them. The materials could also explain how to visually check some of the SNP calls on the Nebula Web site.

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2809 [ 19325 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3748 [ IGBF-3748 ]
            ann.loraine Ann Loraine made changes -
            Description IGB can now open and display contents of CRAM files distributed by Nebula Genomics.

            AL has a recently downloaded CRAM file from Nebula that would be great to use, but it's huge - around 84 Gb.

            For this task, let's use a command line tool to extract the data for chromosome 1, the chromosome shows by default when it opens the latest human genome available in the browser - hg38.

            Let's deploy the actual data file on RENCI hosting and add it as a new data set in IGB Quickload main.

            Once it is there, we can then create a tutorial and also a video showing how to open and load the reads to look for or sanity-check polymorphisms. The tutorial ought to explain the soft clips and the SNPs - what they are and how to interpret them. The materials should also explain how to visually check some of the SNP calls on the Nebula Web site.
            IGB can now open and display contents of CRAM files distributed by Nebula Genomics.

            AL has a recently downloaded CRAM file from Nebula that would be great to use, but it's huge - around 84 Gb.

            For this task, let's use a command line tool to extract the data for chromosome 1, the chromosome shows by default when it opens the latest human genome available in the browser - hg38.

            Let's deploy the actual data file on RENCI hosting and add it as a new data set in IGB Quickload main.

            Once it is there, we can then create a tutorial and also a video showing how to open and load the reads to look for or sanity-check polymorphisms. The tutorial ought to explain the soft clips and the SNPs - what they are and how to interpret them. The materials could also explain how to visually check some of the SNP calls on the Nebula Web site.
            ann.loraine Ann Loraine made changes -
            Summary Set up data for human genome sequencing / CRAM demo Deploy new NEBULA data for human genome sequencing / CRAM demo
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Update:

            • Using samtools, I made a new cram file with chromosome 1 data only and made an index for it:
            local aloraine$ samtools view --threads 16 -Cho M1YNF4X1J.chr1.cram M1YNF4X1J.cram chr1
            local aloraine$ samtools index M1YNF4X1J.chr1.cram
            
            • Checked the header with:
            samtools view -H M1YNF4X1J.chr1.cram
            

            I attached the header section (output from above command) as a plain text file. Check out the command "PG" command line ( ? ) tags. You will be shocked!

            Question:

            • Does IGB keep an in-memory copy of the entire "header section" data for every entry in a CRAM file it has already loaded?
            Show
            ann.loraine Ann Loraine added a comment - - edited Update : Using samtools, I made a new cram file with chromosome 1 data only and made an index for it: local aloraine$ samtools view --threads 16 -Cho M1YNF4X1J.chr1.cram M1YNF4X1J.cram chr1 local aloraine$ samtools index M1YNF4X1J.chr1.cram Checked the header with: samtools view -H M1YNF4X1J.chr1.cram I attached the header section (output from above command) as a plain text file. Check out the command "PG" command line ( ? ) tags. You will be shocked! Question : Does IGB keep an in-memory copy of the entire "header section" data for every entry in a CRAM file it has already loaded?
            ann.loraine Ann Loraine made changes -
            Link This issue blocks IGBF-3854 [ IGBF-3854 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Attachment M1YNF4X1J.header_section.txt [ 18475 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Trying to deploy file from local Spectrum internet is not going well. Uploading to data.bioviz.org was extra slow: 1+ h for 2 Gb. I had to use the VPN. Maybe sending the upload through the VPN was a problem.

            Trying now to upload to RENCI host with:

            local aloraine$ scp -J aloraine@hop.renci.org M1YNF4X1J.chr1.cram aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/genomesequence/H_sapiens_Dec_2013/.
            

            Transfer rate is around 2.2 Mbs

            Show
            ann.loraine Ann Loraine added a comment - - edited Trying to deploy file from local Spectrum internet is not going well. Uploading to data.bioviz.org was extra slow: 1+ h for 2 Gb. I had to use the VPN. Maybe sending the upload through the VPN was a problem. Trying now to upload to RENCI host with: local aloraine$ scp -J aloraine@hop.renci.org M1YNF4X1J.chr1.cram aloraine@lorainelab-quickload.scidas.org:/projects/igbquickload/lorainelab/www/main/htdocs/genomesequence/H_sapiens_Dec_2013/. Transfer rate is around 2.2 Mbs
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Epic Link IGBF-2809 [ 19325 ] IGBF-3856 [ 23155 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Checking uploaded file in IGB:

            • Started IGB
            • Opened human hg38 genome version H_sapiens_Dec_2013
            • Chose File > Open URL and entered URL:

            http://lorainelab-quickload.scidas.org/genomesequence/H_sapiens_Dec_2013/M1YNF4X1J.chr1.cram

            • Visited chromosome 1, navigated to a gene region, zoomed in, selected "Load Data" and "Load Sequence"
            • Made the two attached images showing what appears to be a region where a lot of alignments end or start with gray "soft clip" glyphs. The boundaries of the software clips do not perfectly line up with each other. I do not know how to interpret this, but it seems an interesting pattern.
            Show
            ann.loraine Ann Loraine added a comment - Checking uploaded file in IGB: Started IGB Opened human hg38 genome version H_sapiens_Dec_2013 Chose File > Open URL and entered URL: http://lorainelab-quickload.scidas.org/genomesequence/H_sapiens_Dec_2013/M1YNF4X1J.chr1.cram Visited chromosome 1, navigated to a gene region, zoomed in, selected "Load Data" and "Load Sequence" Made the two attached images showing what appears to be a region where a lot of alignments end or start with gray "soft clip" glyphs. The boundaries of the software clips do not perfectly line up with each other. I do not know how to interpret this, but it seems an interesting pattern.
            ann.loraine Ann Loraine made changes -
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Attn Nowlan Freese and Paige Kulzer:

            This could be reviewed by Nowlan Freese and/or Paige Kulzer.
            To review, open the file in IGB and confirm that you can load the data. Please see above for instructions.

            Show
            ann.loraine Ann Loraine added a comment - Attn Nowlan Freese and Paige Kulzer : This could be reviewed by Nowlan Freese and/or Paige Kulzer . To review, open the file in IGB and confirm that you can load the data. Please see above for instructions.
            pkulzer Paige Kulzer made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            Following the instructions above, I was able to successfully open the file in IGB and have confirmed that I can load data and load sequence. I've also confirmed that no data is loading in chromosomes other than chromosome 1 as expected.

            Ready for next steps!

            Show
            pkulzer Paige Kulzer added a comment - Following the instructions above, I was able to successfully open the file in IGB and have confirmed that I can load data and load sequence. I've also confirmed that no data is loading in chromosomes other than chromosome 1 as expected. Ready for next steps!
            pkulzer Paige Kulzer made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Thank you Paige Kulzer! I think this is all that is needed to do for this ticket. When ready, Nowlan Freese is going to add this to the IGB Quickload main folders for the human genome.

            I think that for the next steps, we should re-open the ticket where we made a video / tutorial for the new CRAM feature and re-do the material to focus on individual genome sequencing.

            I am going to close this ticket now.

            Show
            ann.loraine Ann Loraine added a comment - Thank you Paige Kulzer ! I think this is all that is needed to do for this ticket. When ready, Nowlan Freese is going to add this to the IGB Quickload main folders for the human genome. I think that for the next steps, we should re-open the ticket where we made a video / tutorial for the new CRAM feature and re-do the material to focus on individual genome sequencing. I am going to close this ticket now.
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: