Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3748

Add individual genome CRAM file to IGB quickload via svn

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: We have identified a publicly available Nebula Genomics CRAM file (link). Nowlan has downloaded the file and created an index, and was able to view the file in IGB. We would like to make this file available to IGB users as an example of a consumer genomics CRAM file.

      Task: Upload and deploy the CRAM file (NB72462M.cram), CRAM index, VCF file, and VCF index and make them available through an IGB Quickload. Also deploy the IGBF-3841 via the annots.xml.

      Note: Data are from Personal Genome Project
      Link to metadata in PGP: https://my.pgp-hms.org/profile/huF7A4DE
      Link to PGP search page: https://my.pgp-hms.org/public_genetic_data

      Revision:

      • Scope increase - include Nebula Genomics and Sequence.com data files for Donor 1
      • Scope increase - explore data in region of BRCA1 as preliminary use case, beginning tutorial

        Attachments

        1. annots.xml
          2 kB
        2. BothFiles-Donor1-BRCA1.png
          BothFiles-Donor1-BRCA1.png
          201 kB
        3. BRCA1-chr17-43092417.png
          BRCA1-chr17-43092417.png
          114 kB
        4. chrEBV_error.txt
          18 kB
        5. quickload_v3.zip
          9 kB
        6. rs80357336.png
          rs80357336.png
          181 kB

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestions for testing usability, usefulness and functionality:

            • Use NCBI Web site (or other sources) to identify a pathogenic cancer-causing polymorphism in BRCA1
            • Check the genotype of each sample. Do Donor 1 and/or Personal Genome Project participant huF7A4DE have the pathogenic allele of this polymorophism?

            By trying to do this task, you will gain insight into what individuals will likely want to do when exploring their own data.

            Show
            ann.loraine Ann Loraine added a comment - Suggestions for testing usability, usefulness and functionality: Use NCBI Web site (or other sources) to identify a pathogenic cancer-causing polymorphism in BRCA1 Check the genotype of each sample. Do Donor 1 and/or Personal Genome Project participant huF7A4DE have the pathogenic allele of this polymorophism? By trying to do this task, you will gain insight into what individuals will likely want to do when exploring their own data.
            Hide
            pkulzer Paige Kulzer added a comment -

            To test this ticket, I decided to use ClinVar to identify a pathogenic cancer-causing single nucleotide variant in BRCA1.

            I first added ClinVar as a custom web link in IGB by clicking Tools > Configure Web Links and then clicking the "Create New" button and entering the following URL Pattern: https://www.ncbi.nlm.nih.gov/clinvar/?term=$$

            Unfortunately, when I right-clicked on an SNV at chr17:43,093,219 and chose to search ClinVar with the new web link I made, this returned no results because the ID being searched (FP270005330L1C046R03305352005) is not recognized by ClinVar. And, there was no obvious way to query ClinVar by a position in the chromosome using bp coordinates, so I couldn't look up the SNV that way, either.

            Back in IGB, I then right-clicked on the gene model at that location and used the new ClinVar web link to search "BRCA1" which returned lots of results, though they were non-specific to the point in the genome I clicked.

            I looked through the search results and found a well-reviewed pathogenic SNV on the BRCA1 gene at the following location: chr17:43,045,711. There were no polymorphisms present in either of the three donor tracks. I repeated this process several times with various SNV's as well as pathogenic insertions and deletions but found none present in the donor tracks.

            Overall, I think there will be a bit of friction for new users that want to use external databases to cross-reference their polymorphisms directly from IGB, but there is a lot of potential now to update our training in a way that gets people started using IGB to view their personal genomics data by using this Quickload.

            Show
            pkulzer Paige Kulzer added a comment - To test this ticket, I decided to use ClinVar to identify a pathogenic cancer-causing single nucleotide variant in BRCA1. I first added ClinVar as a custom web link in IGB by clicking Tools > Configure Web Links and then clicking the "Create New" button and entering the following URL Pattern: https://www.ncbi.nlm.nih.gov/clinvar/?term=$$ Unfortunately, when I right-clicked on an SNV at chr17:43,093,219 and chose to search ClinVar with the new web link I made, this returned no results because the ID being searched (FP270005330L1C046R03305352005) is not recognized by ClinVar. And, there was no obvious way to query ClinVar by a position in the chromosome using bp coordinates, so I couldn't look up the SNV that way, either. Back in IGB, I then right-clicked on the gene model at that location and used the new ClinVar web link to search "BRCA1" which returned lots of results, though they were non-specific to the point in the genome I clicked. I looked through the search results and found a well-reviewed pathogenic SNV on the BRCA1 gene at the following location: chr17:43,045,711 . There were no polymorphisms present in either of the three donor tracks. I repeated this process several times with various SNV's as well as pathogenic insertions and deletions but found none present in the donor tracks. Overall, I think there will be a bit of friction for new users that want to use external databases to cross-reference their polymorphisms directly from IGB, but there is a lot of potential now to update our training in a way that gets people started using IGB to view their personal genomics data by using this Quickload.
            Hide
            ann.loraine Ann Loraine added a comment -

            Re today's quick discussion during the scrum:

            Thank you Paige Kulzer for the review and comments!

            If you notice anything about how the data are deployed on the web site that is hosting the data, maybe make a new ticket for that and put it into the next sprint. Then, you could close this ticket if you see fit!

            Show
            ann.loraine Ann Loraine added a comment - Re today's quick discussion during the scrum: Thank you Paige Kulzer for the review and comments! If you notice anything about how the data are deployed on the web site that is hosting the data, maybe make a new ticket for that and put it into the next sprint. Then, you could close this ticket if you see fit!
            Hide
            pkulzer Paige Kulzer added a comment -

            Now that I've taken a closer look at how the data are deployed on the web site that is hosting the data, I've discovered that Donor 1 SQ867JX4 is being loaded on the wrong genome. The data for this donor is being stored in a folder called unknown_reference, but, per the text file in that folder (ULTIMATE-COMPATIBILITY-SQ867JX4-30x-WGS-Sequencing_com-12-25-24.txt), this donor's raw genotype data was aligned to GRCh37, not GRCh38:

            [...] We are using reference human assembly build 37 (also known as Annotation Release 104). [...] More information on reference human assembly builds: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/

            I'm going to make a new ticket to address this issue and will now close this one!

            Show
            pkulzer Paige Kulzer added a comment - Now that I've taken a closer look at how the data are deployed on the web site that is hosting the data, I've discovered that Donor 1 SQ867JX4 is being loaded on the wrong genome. The data for this donor is being stored in a folder called unknown_reference , but, per the text file in that folder ( ULTIMATE-COMPATIBILITY-SQ867JX4-30x-WGS-Sequencing_com-12-25-24.txt ), this donor's raw genotype data was aligned to GRCh37, not GRCh38: [...] We are using reference human assembly build 37 (also known as Annotation Release 104). [...] More information on reference human assembly builds: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/ I'm going to make a new ticket to address this issue and will now close this one!
            Hide
            ann.loraine Ann Loraine added a comment -

            Wow - thanks Paige Kulzer for catching this!!!!

            Show
            ann.loraine Ann Loraine added a comment - Wow - thanks Paige Kulzer for catching this!!!!

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: