Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2246

Create new App to show results from personal genotyping platforms

    Details

    • Type: Epic
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Epic Name:
      Show personal genotype

      Description

      Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

      However, one problem is that people do not have a good understanding of what the data really mean.

      Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

      A typical session would look like this:

      • User gets their "raw" data from the genotyping company (typically some kind of plain text file)
      • User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
      • User opens their file in IGB. IGB then shows a new empty track.
      • User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
      • They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
      • User then clicks "Load Data"

      At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

      The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

      Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

      See that repository's test/resources directory for a sample file.

      Documentation about the data file format:
      https://customercare.23andme.com/hc/en-us/articles/115004459928-Raw-Genotype-Data-Technical-Details

      The documentation does not say this, but the coordinates listed in the file are one-based. So to convert to IGB coordinates, subtract one.
      See also:
      https://bitbucket.org/lorainelab/23andme-snp-converter/src/master/src/main/java/org/lorainelab/igb/snp/convert/beans/Bed.java

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-1908 [ 17998 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Assignee Srishti Tiwari [ stiwari8 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Can use narrowpeak plugin (in IGB project code base) as a way to get started.
            Added example narrow peak file in narrowpeak test directory.
            Can step through the code using debugger to see how IGB adds data from a file using this file as a demo.

            Show
            ann.loraine Ann Loraine added a comment - - edited Can use narrowpeak plugin (in IGB project code base) as a way to get started. Added example narrow peak file in narrowpeak test directory. Can step through the code using debugger to see how IGB adds data from a file using this file as a demo.
            ann.loraine Ann Loraine made changes -
            Summary Create new App to show SNPs from 23 and Me data Create new App to show SNPs from personal genotype platform
            ann.loraine Ann Loraine made changes -
            Description Create a new App that lets users open their 23 and Me results and view the results interactively.

            For this, create a new classes that read the file, creates SeqSymmetry objects, and then adds them to the IGB map in a new track containing new specialized SnpGlyph objects.

            Create a new App that lets users open their personal genotype results files and view the results interactively.

            We will start with 23 and Me data, since we have a lot of it.

            For this, create a new classes that read the file, creates SeqSymmetry objects, and then adds them to the IGB map in a new track containing new specialized SnpGlyph objects.

            ann.loraine Ann Loraine made changes -
            Summary Create new App to show SNPs from personal genotype platform Create new App to show SNPs from personal genotyping platforms
            ann.loraine Ann Loraine made changes -
            Description Create a new App that lets users open their personal genotype results files and view the results interactively.

            We will start with 23 and Me data, since we have a lot of it.

            For this, create a new classes that read the file, creates SeqSymmetry objects, and then adds them to the IGB map in a new track containing new specialized SnpGlyph objects.

            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then explore the results.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene"
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of genotype results. The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we will want to know are: Is the location homozygous or heterozygous? Is the location the same as the reference or not? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            ann.loraine Ann Loraine made changes -
            Assignee Srishti Tiwari [ stiwari8 ]
            ann.loraine Ann Loraine made changes -
            Epic Link IGBF-1908 [ 17998 ]
            ann.loraine Ann Loraine made changes -
            Issue Type New Feature [ 2 ] Epic [ 10000 ]
            Epic Name Show personal genotype
            Epic Status To Do [ 10001 ]
            ann.loraine Ann Loraine made changes -
            Story Points 5
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 : 3 Feb to 14 Feb [ 86 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2247 [ 18394 ]
            ann.loraine Ann Loraine made changes -
            Summary Create new App to show SNPs from personal genotyping platforms Create new App to show results from personal genotyping platforms
            ann.loraine Ann Loraine made changes -
            Description Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then explore the results.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene"
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of genotype results. The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we will want to know are: Is the location homozygous or heterozygous? Is the location the same as the reference or not? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene"
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of genotype results. The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we will want to know are: Is the location homozygous or heterozygous? Is the location the same as the reference or not? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            ann.loraine Ann Loraine made changes -
            Description Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene"
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of genotype results. The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we will want to know are: Is the location homozygous or heterozygous? Is the location the same as the reference or not? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What particular allele does the person have in each position? Is the allele the same as the reference? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            ann.loraine Ann Loraine made changes -
            Description Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What particular allele does the person have in each position? Is the allele the same as the reference? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter


            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

            See that repository's test/resources directory for a sample file.

            ann.loraine Ann Loraine made changes -
            Comment [ Suggested structure for the project:

            Module name: File Handler Plugin - Genotyping Results
            Description: A plugin for loading and visualization results from genotyping assays, such as the 23 and Me SNP chip test results

            Code location: integrated-genome-browser/plugins/GenotypingResults

            java source code files and packages:

            * all are in package: org.lorainelab.genotype
            * GenotypeFileHandler.java - implements FileTypeHandler
            * Genotype.java - a SymLoader that captures logic for creating VariationSym objects
            * SingleBaseVariationSym.java - data structure for representing a single-base location on the genome where individuals vary (SNP)
            * DeletionVariationSym.java - data structure for representing a segment of sequence that can be deleted or included in different individuals
            * InsertionVariationSym.java - data structure for representing a segment of sequence that can be inserted or not inserted in different individuals
            * SingleBaseVariationGlyph.java - a subclass of Glyph that draws a SingleBaseVariationSym.java
            * DeletionVariationGlyph.java - a subclass of Glyph that draws a DeletionVariationSym.java
            * InsertionVariationGlyph.java - a subclass of Glyph that draws an InsertionVariationSym.java
            * GenotypingResultsFactory.java - implements factory methods for adding above Glyphs to a SeqMapView (note: we may want to make different factories for different Sym and Glyph types)

            One doubt: The package com.affymetrix.igb.view.factories contains classes with methods that accept SeqSymmetry objects and use them to create and add Glyph (graphics) objects to IGB's SeqMapView. I have not yet worked out how IGB decides which of these to use when users open files, add new tracks, and then click Load Data. However, all of these classes are annotated as Components, suggesting that if we configure our project correctly, we can assume that our GenotypingResultsFactory will be the one selected by the framework to populate our track with Glyphs.
            ]
            ann.loraine Ann Loraine made changes -
            Comment [ I recommend developing this as a new Plugin within the IGB code base so that we can use the debugger. ]
            ann.loraine Ann Loraine made changes -
            Comment [ Next task:

            Understand how IGB selects which com.affymetrix.igb.view.factories class to use when loading data from a DataSet (file).

            Based on looking at the code, it seems like this method:

            * private void addAnnotationGlyphs(SeqMapView smv, RootSeqSymmetry annotSym, BioSeq seq)

            in

            * com.affymetrix.igb.view.TrackView

            Is where the MapTierGlyphFactoryI for a given data set is identified and used to add Glyphs to IGB's SeqMapView

            We need to confirm by adding breakpoints to this method and seeing what happens when we click "load data" ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2248 [ 18395 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2249 [ 18396 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2250 [ 18397 ]
            ann.loraine Ann Loraine made changes -
            Description Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

            See that repository's test/resources directory for a sample file.

            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

            See that repository's test/resources directory for a sample file.

            Documentation about the data file format:
            https://customercare.23andme.com/hc/en-us/articles/115004459928-Raw-Genotype-Data-Technical-Details
            ann.loraine Ann Loraine made changes -
            Description Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

            See that repository's test/resources directory for a sample file.

            Documentation about the data file format:
            https://customercare.23andme.com/hc/en-us/articles/115004459928-Raw-Genotype-Data-Technical-Details
            Many people are getting their genotypes done by companies such as 23 and Me, ancestry.com, and others.

            However, one problem is that people do not have a good understanding of what the data really mean.

            Let's add some new capability to IGB that will allow them to open their results files in IGB and then use IGB to learn about genetics.

            A typical session would look like this:

            * User gets their "raw" data from the genotyping company (typically some kind of plain text file)
            * User clicks on the version of the reference human genome associated with the file. Depending on the company, usually this is H_sapiens_Feb_2009 - not the latest one! The user will have know which version it is to proceed!
            * User opens their file in IGB. IGB then shows a new empty track.
            * User then searches for a gene of interest, e.g., BRCA1, the so-called "breast cancer gene."
            * They zoom and pan to the gene of interest - by double-clicking results in Advanced Search or using the Quick search box
            * User then clicks "Load Data"

            At that point, all the genotype results mapping to the region in view will load. Users will then see a bunch of new items in their genotype results track.

            The results will look like rectangles that are maybe 21 bases long. The center of the rectangle can show a diamond or other shape that indicates something about the result. One either side, there will be two "wings" that make it easy for the user to select or interact with the result. Color can be used to indicate aspects of the result, as well. Key things we should show them are: Is the location homozygous or heterozygous? What alleles (bases) are present at each position? Also, what is the identifier of the location, e.g, the "rs id"? What we show will depend a lot on what the genotyping provider puts into the file as well as what we can easily grab from the internet via REST services and that type of thing. For example, we might want to include some REST queries in the code that loads the file. And of course we would want to link out to external sites such as SNPedia so that people can dig deeper into their results.

            Let's start with 23 and Me data, since we are already pretty familiar with these data from working on the 23 and Me App - https://bitbucket.org/lorainelab/23andme-snp-converter

            See that repository's test/resources directory for a sample file.

            Documentation about the data file format:
            https://customercare.23andme.com/hc/en-us/articles/115004459928-Raw-Genotype-Data-Technical-Details

            The documentation does not say this, but the coordinates listed in the file are one-based. So to convert to IGB coordinates, subtract one.
            See also:
            https://bitbucket.org/lorainelab/23andme-snp-converter/src/master/src/main/java/org/lorainelab/igb/snp/convert/beans/Bed.java
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2251 [ 18398 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2252 [ 18399 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2253 [ 18400 ]
            shamika Shamika Gajanan Kulkarni (Inactive) made changes -
            Epic Child IGBF-2268 [ 18415 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-1790 [ IGBF-1790 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-1791 [ IGBF-1791 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-1793 [ IGBF-1793 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-2312 [ 18466 ]
            ann.loraine Ann Loraine made changes -
            Epic Status To Do [ 10001 ] Done [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: