[IGBF-1380] Support newer VCF - User request - JIRA UNCC

Ann Loraine created issue - 07/Aug/18 9:23 AM

Ann Loraine made changes - 07/Aug/18 9:23 AM

Field	Original Value	New Value
Link		This issue relates to ~~HELP-303~~ [ ~~HELP-303~~ ]

Ann Loraine made changes - 07/Aug/18 9:24 AM

Rank

Ranked higher

Ann Loraine made changes - 07/Aug/18 9:24 AM

Link

This issue relates to ~~HELP-303~~ [ ~~HELP-303~~ ]

Ann Loraine made changes - 07/Aug/18 9:25 AM

Link

This issue relates to ~~HELP-303~~ [ ~~HELP-303~~ ]

Ann Loraine made changes - 14/Aug/18 3:02 PM

Rank

Ranked higher

Ann Loraine made changes - 14/Aug/18 3:05 PM

Rank

Ranked higher

Ann Loraine made changes - 14/Aug/18 3:21 PM

Assignee

Ivory Blakley [ ieclabau ]

Ann Loraine made changes - 14/Aug/18 3:21 PM

Rank

Ranked higher

Ann Loraine made changes - 14/Aug/18 3:39 PM

Rank

Ranked higher

Ann Loraine made changes - 14/Aug/18 3:39 PM

Summary

Support newer VCF

Support newer VCF - User request

Ann Loraine made changes - 14/Aug/18 3:52 PM

Labels

intermediate

Advanced

Ann Loraine made changes - 14/Aug/18 5:15 PM

Assignee

Ivory Blakley [ ieclabau ]

Ann Loraine made changes - 14/Aug/18 5:20 PM

Sprint

Fall 2018 1 [ 51 ]

Ann Loraine made changes - 14/Aug/18 5:20 PM

Rank

Ranked lower

Ivory Blakley (Inactive) made changes - 20/Aug/18 3:43 PM

Status

Open [ 1 ]

In Progress [ 3 ]

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 20/Aug/18 4:50 PM

I think this is the file I will need to change, possibly the only file.
core/genometry/src/main/java/com/affymetrix/genometry/symloader/VCF.java

This post is helpful:
https://bioinformatics.stackexchange.com/questions/344/whats-the-difference-between-vcf-spec-versions-4-1-and-4-2

For reference, the PDF Samtools specification for VCF 4.1 (currently supported in IGB), 4.2 (not yet supported), and 4.3 (not yet supported)
https://samtools.github.io/hts-specs/VCFv4.1.pdf
https://samtools.github.io/hts-specs/VCFv4.2.pdf
https://samtools.github.io/hts-specs/VCFv4.3.pdf

Show

Ivory Blakley (Inactive) added a comment - 20/Aug/18 4:50 PM I think this is the file I will need to change, possibly the only file. core/genometry/src/main/java/com/affymetrix/genometry/symloader/ VCF.java This post is helpful: https://bioinformatics.stackexchange.com/questions/344/whats-the-difference-between-vcf-spec-versions-4-1-and-4-2 For reference, the PDF Samtools specification for VCF 4.1 (currently supported in IGB), 4.2 (not yet supported), and 4.3 (not yet supported) https://samtools.github.io/hts-specs/VCFv4.1.pdf https://samtools.github.io/hts-specs/VCFv4.2.pdf https://samtools.github.io/hts-specs/VCFv4.3.pdf

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 20/Aug/18 5:29 PM

To highlight differences between versions, download this repo:
https://github.com/samtools/hts-specs

That includes these files:
VCFv4.1.tex
VCFv4.2.tex
VCFv4.3.tex

Then do
$ diff VCFv4.1.tex VCFv4.2.tex > Diffs4.1To4.2

In Sublime (and probably other programs) the differences are color-coded with pink for the 4.1 file and green for the 4.2 file with the commit id in gray:
27c27
< ##fileformat=VCFv4.1
—
> ##fileformat=VCFv4.2

Show

Ivory Blakley (Inactive) added a comment - 20/Aug/18 5:29 PM To highlight differences between versions, download this repo: https://github.com/samtools/hts-specs That includes these files: VCFv4.1.tex VCFv4.2.tex VCFv4.3.tex Then do $ diff VCFv4.1.tex VCFv4.2.tex > Diffs4.1To4.2 In Sublime (and probably other programs) the differences are color-coded with pink for the 4.1 file and green for the 4.2 file with the commit id in gray: 27c27 < ##fileformat=VCFv4.1 — > ##fileformat=VCFv4.2

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 20/Aug/18 5:50 PM

A quick review of VCF file format....

The format includes several straight-forward columns like CHROM, POS, ID; where that one-word heading describes the one type of value the appears in that column.

The format also include 3 rather complicated columns, that each essentially have a file format onto themselves.
__________________
INFO - in the file header several in INFO fields are described. You could think of each entry in the INFO column as its own little object and the INFO headers are the class description for that object.
For example:
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
Can be read as
The object shall an member called NS, which will have a length of 1, and be of type integer, and is the Number of Samples With Data

So the header:
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">

Tells you how to interpret the entry:
NS=3;DP=14;AF=0.5;DB;H2
or the entry:
NS=2;DP=10;AF=0.333,0.667;AA=T;DB

Notice that AF has two values in the second case. Number is not always a number. Sometimes a member in the INFO field has a value for each allele, so instead of a number the value of "number" is "A". There some other special cases denoted by letters.

__________________
FILTER - in the header, several filters are listed with an ID and a description. The ID is the string that appear in the FILTER field for a row (a SNP) that fails to pass that filter. The description is the human readable description for SNPs that fail to pass that filter. The FILTER field value for any SNP that passes all filters is "PASS".

__________________
FORMAT - is like its own little file format describing how to parse the fields for each sample. Each sample is column. For a given sample (column) for a given row (SNP) there are several values separated by ":". The FORMAT headers describe the ID, number of values, value type, and description for each possible member (just like INFO).
Often the FORMAT field is the same for many or all rows, so it is tempting to think there should just be one "format" used for the whole file. But the various attributes are not always relevant, or not always present for all entries. I don't remember if there is even any guaranty that the values that are present will appear in the same order. So each individual row gets its own mini format summary in the FORMAT filed, and the FORMAT headers elaborate on what those attributes mean, and the sample columns are where actual values are.

Show

Ivory Blakley (Inactive) added a comment - 20/Aug/18 5:50 PM A quick review of VCF file format.... The format includes several straight-forward columns like CHROM, POS, ID; where that one-word heading describes the one type of value the appears in that column. The format also include 3 rather complicated columns, that each essentially have a file format onto themselves. __________________ INFO - in the file header several in INFO fields are described. You could think of each entry in the INFO column as its own little object and the INFO headers are the class description for that object. For example: ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> Can be read as The object shall an member called NS , which will have a length of 1 , and be of type integer , and is the Number of Samples With Data So the header: ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> Tells you how to interpret the entry: NS=3;DP=14;AF=0.5;DB;H2 or the entry: NS=2;DP=10;AF=0.333,0.667;AA=T;DB Notice that AF has two values in the second case. Number is not always a number. Sometimes a member in the INFO field has a value for each allele, so instead of a number the value of "number" is "A". There some other special cases denoted by letters. __________________ FILTER - in the header, several filters are listed with an ID and a description. The ID is the string that appear in the FILTER field for a row (a SNP) that fails to pass that filter. The description is the human readable description for SNPs that fail to pass that filter. The FILTER field value for any SNP that passes all filters is "PASS". __________________ FORMAT - is like its own little file format describing how to parse the fields for each sample. Each sample is column. For a given sample (column) for a given row (SNP) there are several values separated by ":". The FORMAT headers describe the ID, number of values, value type, and description for each possible member (just like INFO). Often the FORMAT field is the same for many or all rows, so it is tempting to think there should just be one "format" used for the whole file. But the various attributes are not always relevant, or not always present for all entries. I don't remember if there is even any guaranty that the values that are present will appear in the same order. So each individual row gets its own mini format summary in the FORMAT filed, and the FORMAT headers elaborate on what those attributes mean, and the sample columns are where actual values are.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 21/Aug/18 1:03 PM - edited

from
https://bioinformatics.stackexchange.com/questions/344/whats-the-difference-between-vcf-spec-versions-4-1-and-4-2

Changes to the v4.2 compared to v4.1:

Information field format: adding source and version as recommended fields.
INFO field can have one value for each possible allele (code R).
For all of the ##INFO, ##FORMAT, ##FILTER, and ##ALT metainformation, extra fields can be included after the default fields.
Alternate base (ALT) can include *: missing due to a upstream deletion.
Quality scores, a sentence removed: High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired.
Examples changed a bit.

Show

Ivory Blakley (Inactive) added a comment - 21/Aug/18 1:03 PM - edited from https://bioinformatics.stackexchange.com/questions/344/whats-the-difference-between-vcf-spec-versions-4-1-and-4-2 Changes to the v4.2 compared to v4.1: Information field format: adding source and version as recommended fields. INFO field can have one value for each possible allele (code R). For all of the ##INFO, ##FORMAT, ##FILTER, and ##ALT metainformation, extra fields can be included after the default fields. Alternate base (ALT) can include *: missing due to a upstream deletion. Quality scores, a sentence removed: High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. Examples changed a bit.

Hide

Permalink

Ann Loraine added a comment - 21/Aug/18 4:24 PM - edited

The htsjdk library contains vcf code:
https://github.com/samtools/htsjdk/tree/master/src/main/java/htsjdk/variant/vcf
Investigate this code and how we can use it. We recently upgraded to using this library, but didn't investigate its vcf-related packages as yet. Probably it can help.

Show

Ann Loraine added a comment - 21/Aug/18 4:24 PM - edited The htsjdk library contains vcf code: https://github.com/samtools/htsjdk/tree/master/src/main/java/htsjdk/variant/vcf Investigate this code and how we can use it. We recently upgraded to using this library, but didn't investigate its vcf-related packages as yet. Probably it can help.

Ann Loraine made changes - 23/Aug/18 4:22 PM

Link

This issue relates to IGBF-1387 [ IGBF-1387 ]

Ivory Blakley (Inactive) made changes - 28/Aug/18 1:01 PM

Attachment		Diffs4.1To4.2 [ 14147 ]
Attachment		Diffs4.2To4.3 [ 14148 ]

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:08 PM - edited

from: https://samtools.github.io/hts-specs/VCFv4.3.pdf

List of changes

7.1 Changes to VCFv4.3
• More strict language: ”should” replaced with ”must” where appropriate
• Tables with Type and Number definitions for INFO and FORMAT reserved keys

7.2 Changes between VCFv4.2 and VCFv4.3
• VCF compliant implementations must support both LF and CR+LF newline conventions
• INFO and FORMAT tag names must match the regular expression ^[A-Za-z ][0-9A-Za-z .]*$
• Spaces are allowed in INFO field values
• Characters with special meaning (such as ’;’ in INFO, ’:’ in FORMAT, and ’%’ in both) can be encoded using
the percent encoding (see Section 1.2)
• The character encoding of VCF files is UTF-8.
35
• The SAMPLE field can contain optional DOI URL for the source data file
• Introduced ##META header lines for defining phenotype metadata
• New reserved tag ”CNP” analogous to ”GP” was added. Both CNP and GP use 0 to 1 encoding, which is a
change from previous phred-scaled GP.
• In order for VCF and BCF to have the same expressive power, we state explicitly that Integers and Floats are
32-bit numbers. Integers are signed.
• We state explicitly that zero length strings are not allowed, this includes the CHROM and ID column, INFO
IDs, FILTER IDs and FORMAT IDs. Meta-information lines can be in any order, with the exception of
##fileformat which must come first.
• All header lines of the form ##key=<ID=xxx,...> must have an ID value that is unique for a given value of
”key”. All header lines whose value starts with ”<” must have an ID field. Therefore, also ##PEDIGREE
newly requires a unique ID.
• We state explicitly that duplicate IDs, FILTER, INFO or FORMAT keys are not valid.
• A section about gVCF was added, introduced the <*> symbolic allele.
• A section about tag naming conventions was added.
• New reserved AD, ADF, and ADR INFO and FORMAT fields added.
• Removed unused and ill-defined GLE FORMAT tag.
• Chromosome names cannot use reserved symbolic alleles and contain characters used by breakpoints (Section
1.4.7).
• IUPAC ambiguity codes should be converted to a concrete base.
• Symbolic ALTs for IUPAC codes.

7.3 Changes between BCFv2.1 and BCFv2.2
• BCF header lines can include optional IDX field
• We introduce end-of-vector byte and reserve 8 values for future use
• Clarified that except the end-of-vector byte, no other negative values are allowed in the GT array
• String vectors in BCF do not need to start with comma, as the number of values is indicated already in the
definition of the tag in the header.
• The implicit filter PASS was described inconsistently throughout BCFv2.1: It is encoded as the first entry in
the dictionary, not the last.

Show

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:08 PM - edited from: https://samtools.github.io/hts-specs/VCFv4.3.pdf List of changes 7.1 Changes to VCFv4.3 • More strict language: ”should” replaced with ”must” where appropriate • Tables with Type and Number definitions for INFO and FORMAT reserved keys 7.2 Changes between VCFv4.2 and VCFv4.3 • VCF compliant implementations must support both LF and CR+LF newline conventions • INFO and FORMAT tag names must match the regular expression ^ [A-Za-z ] [0-9A-Za-z .] *$ • Spaces are allowed in INFO field values • Characters with special meaning (such as ’;’ in INFO, ’:’ in FORMAT, and ’%’ in both) can be encoded using the percent encoding (see Section 1.2) • The character encoding of VCF files is UTF-8. 35 • The SAMPLE field can contain optional DOI URL for the source data file • Introduced ##META header lines for defining phenotype metadata • New reserved tag ”CNP” analogous to ”GP” was added. Both CNP and GP use 0 to 1 encoding, which is a change from previous phred-scaled GP. • In order for VCF and BCF to have the same expressive power, we state explicitly that Integers and Floats are 32-bit numbers. Integers are signed. • We state explicitly that zero length strings are not allowed, this includes the CHROM and ID column, INFO IDs, FILTER IDs and FORMAT IDs. Meta-information lines can be in any order, with the exception of ##fileformat which must come first. • All header lines of the form ##key=<ID=xxx,...> must have an ID value that is unique for a given value of ”key”. All header lines whose value starts with ”<” must have an ID field. Therefore, also ##PEDIGREE newly requires a unique ID. • We state explicitly that duplicate IDs, FILTER, INFO or FORMAT keys are not valid. • A section about gVCF was added, introduced the <*> symbolic allele. • A section about tag naming conventions was added. • New reserved AD, ADF, and ADR INFO and FORMAT fields added. • Removed unused and ill-defined GLE FORMAT tag. • Chromosome names cannot use reserved symbolic alleles and contain characters used by breakpoints (Section 1.4.7). • IUPAC ambiguity codes should be converted to a concrete base. • Symbolic ALTs for IUPAC codes. 7.3 Changes between BCFv2.1 and BCFv2.2 • BCF header lines can include optional IDX field • We introduce end-of-vector byte and reserve 8 values for future use • Clarified that except the end-of-vector byte, no other negative values are allowed in the GT array • String vectors in BCF do not need to start with comma, as the number of values is indicated already in the definition of the tag in the header. • The implicit filter PASS was described inconsistently throughout BCFv2.1: It is encoded as the first entry in the dictionary, not the last.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:20 PM

Looking at the list of changes between version 4.2 and 4.3...
Most of these changes are relevant when writing VCF files.

The information that IGB uses to display the entry in the context of the genome is unchanged. The file format is more explicit, and a bit stricter, but there is no need to make our parser reflect these rules. Our parser should allow IGB to show the data that appears in the file, its not our parsers job to enforce specific format rules.

Aside from allowing format=4.3, I don't see any further changes that should be needed to read VCFv4.3 compared to VCFv4.2.

Show

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:20 PM Looking at the list of changes between version 4.2 and 4.3... Most of these changes are relevant when writing VCF files. The information that IGB uses to display the entry in the context of the genome is unchanged. The file format is more explicit, and a bit stricter, but there is no need to make our parser reflect these rules. Our parser should allow IGB to show the data that appears in the file, its not our parsers job to enforce specific format rules. Aside from allowing format=4.3, I don't see any further changes that should be needed to read VCFv4.3 compared to VCFv4.2.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:30 PM

The changes for 4.2 (compared to 4.1)

INFO field can have one value for each possible allele (code R).
This allows one more possible special value to the number value for an INFO field. IGB does not currently read/use this field. No change needed.

For all of the ##INFO, ##FORMAT, ##FILTER, and ##ALT meta information, extra fields can be included after the default fields.
As part of a general improvement to the parser, we should include a map for each entry that would be able to these open form values. These values would be part of the selection info for an entry, but they will not affect how the entry is displayed. This can be handled as part of issue IGBF-543.

Alternate base (ALT) can include *: missing due to a upstream deletion.
I am still figuring out if/how the code might need to change to accommodate this. None of my example v4.2 files use this option.

Show

Ivory Blakley (Inactive) added a comment - 28/Aug/18 2:30 PM The changes for 4.2 (compared to 4.1) INFO field can have one value for each possible allele (code R). This allows one more possible special value to the number value for an INFO field. IGB does not currently read/use this field. No change needed. For all of the ##INFO, ##FORMAT, ##FILTER, and ##ALT meta information, extra fields can be included after the default fields. As part of a general improvement to the parser, we should include a map for each entry that would be able to these open form values. These values would be part of the selection info for an entry, but they will not affect how the entry is displayed. This can be handled as part of issue IGBF-543 . Alternate base (ALT) can include *: missing due to a upstream deletion. I am still figuring out if/how the code might need to change to accommodate this. None of my example v4.2 files use this option.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 28/Aug/18 3:32 PM - edited

As far as allowing '*' as a symbol in ALT,
None of my test files use that option. I manually edited a file to have <*> as the alt allele, and it looks like it the string "<*>" was passed along instead of a letter, or instead of "<X>" without any issue.

So, as far as I can tell, we don't need to make any changes to accommodate this option.

This notation is explained in section 5.5 of the format specification, but I am still unclear about how this type should be represented. I don't know if the END tag is mandatory when the <*> is used. I think using the END tag when it is present should be part of our general improvement to the parser.

Show

Ivory Blakley (Inactive) added a comment - 28/Aug/18 3:32 PM - edited As far as allowing '*' as a symbol in ALT, None of my test files use that option. I manually edited a file to have <*> as the alt allele, and it looks like it the string "<*>" was passed along instead of a letter, or instead of "<X>" without any issue. So, as far as I can tell, we don't need to make any changes to accommodate this option. This notation is explained in section 5.5 of the format specification, but I am still unclear about how this type should be represented. I don't know if the END tag is mandatory when the <*> is used. I think using the END tag when it is present should be part of our general improvement to the parser.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 28/Aug/18 4:13 PM

The goal of this issue is to make IGB accept the VCF 4.2 (and 4.3) format. That much has been accomplished, see this branch:
https://bitbucket.org/IvoryBlak/integrated-genome-browser/branch/IGBF-1380_Support_VCF4.2

v4.2 introduces new feature options that we should incorporate into the info for each feature. That improvement, and other improvements/corrections to the existing parser will be handled later as part of issue IGBF-543.

I am moving this to Needs first level review.
Issue IGBF-930 has a good test file.
You can look for more in jira using
attachments is not EMPTY AND text ~ VCF
under Issues > Search for issues
I also have couple files I have acquired from researchers and cannot include here (they are big, and not public), but I can share them directly with tester.

The more test files the better. This format allows for a lot of options, and the files I have do not represent all options.

Show

Ivory Blakley (Inactive) added a comment - 28/Aug/18 4:13 PM The goal of this issue is to make IGB accept the VCF 4.2 (and 4.3) format. That much has been accomplished, see this branch: https://bitbucket.org/IvoryBlak/integrated-genome-browser/branch/IGBF-1380_Support_VCF4.2 v4.2 introduces new feature options that we should incorporate into the info for each feature. That improvement, and other improvements/corrections to the existing parser will be handled later as part of issue IGBF-543 . I am moving this to Needs first level review. Issue IGBF-930 has a good test file. You can look for more in jira using attachments is not EMPTY AND text ~ VCF under Issues > Search for issues I also have couple files I have acquired from researchers and cannot include here (they are big, and not public), but I can share them directly with tester. The more test files the better. This format allows for a lot of options, and the files I have do not represent all options.

Ivory Blakley (Inactive) made changes - 28/Aug/18 4:14 PM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Ivory Blakley (Inactive) made changes - 28/Aug/18 4:14 PM

Assignee

Ivory Blakley [ ieclabau ]

Ivory Blakley (Inactive) made changes - 29/Aug/18 10:13 AM

Assignee

Sneha Ramesh Watharkar [ jdaly ]

Ann Loraine made changes - 30/Aug/18 1:17 PM

Rank

Ranked higher

Ann Loraine made changes - 30/Aug/18 1:20 PM

Sprint

Fall 2018 1 [ 51 ]

Fall 2018 1, Fall 2018 Sprint 2 [ 51, 52 ]

Ann Loraine made changes - 30/Aug/18 1:20 PM

Rank

Ranked higher

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 30/Aug/18 4:06 PM

To test, drag/drop VCF files into IGB.

On master, VCF v4.2 files will not open, you'll get an error message.

On my branch, they should open and display data.

Show

Ivory Blakley (Inactive) added a comment - 30/Aug/18 4:06 PM To test, drag/drop VCF files into IGB. On master, VCF v4.2 files will not open, you'll get an error message. On my branch, they should open and display data.

Hide

Permalink

Sneha Ramesh Watharkar (Inactive) added a comment - 30/Aug/18 5:10 PM

Ivory Blakley : Tested on windows with basic vcf file from issue IGBF-930 file. Works as expected in your branch but I have a doubt.
Once we drag drop the vcf file on IGB, It gives message to the user saying "Zoom in to display data" but the data has not yet loaded. We need to click "Load data" button then the data starts appearing so that Zoom message is misleading!
Apart from that it works as per the test case described above. Detailed testing will be done by Nowlan/Ahn I suppose?

For Code review:
Completed reviewing the code. I have left few comments and need for clarification points. Please look into those.

Thanks!

Show

Sneha Ramesh Watharkar (Inactive) added a comment - 30/Aug/18 5:10 PM Ivory Blakley : Tested on windows with basic vcf file from issue IGBF-930 file. Works as expected in your branch but I have a doubt. Once we drag drop the vcf file on IGB, It gives message to the user saying "Zoom in to display data" but the data has not yet loaded. We need to click "Load data" button then the data starts appearing so that Zoom message is misleading! Apart from that it works as per the test case described above. Detailed testing will be done by Nowlan/Ahn I suppose? For Code review: Completed reviewing the code. I have left few comments and need for clarification points. Please look into those. Thanks!

Hide

Permalink

Sneha Ramesh Watharkar (Inactive) added a comment - 30/Aug/18 5:10 PM

Moving to To-do column and assigning to Ivory Blakley.

Show

Sneha Ramesh Watharkar (Inactive) added a comment - 30/Aug/18 5:10 PM Moving to To-do column and assigning to Ivory Blakley .

Sneha Ramesh Watharkar (Inactive) made changes - 30/Aug/18 5:11 PM

Assignee

Sneha Ramesh Watharkar [ jdaly ]

Ivory Blakley [ ieclabau ]

Sneha Ramesh Watharkar (Inactive) made changes - 30/Aug/18 5:11 PM

Status

Needs 1st Level Review [ 10005 ]

Open [ 1 ]

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 04/Sep/18 11:01 AM - edited

I think it will be best to talk through the code comments in person. Sneha and I can talk through this on Tuesday afternoon.

Notes from in-person review:
remove the code that was commented out in core/genometry/src/main/java/com/affymetrix/genometry/symloader/VCF.java
line 328

core/genometry/src/main/java/com/affymetrix/genometry/symloader/VCF.java
line 520

Otherwise, comments were addressed in discussion. No need for further testing. Remove the commented code and create pull request.

Show

Ivory Blakley (Inactive) added a comment - 04/Sep/18 11:01 AM - edited I think it will be best to talk through the code comments in person. Sneha and I can talk through this on Tuesday afternoon. Notes from in-person review: remove the code that was commented out in core/genometry/src/main/java/com/affymetrix/genometry/symloader/VCF.java line 328 core/genometry/src/main/java/com/affymetrix/genometry/symloader/VCF.java line 520 Otherwise, comments were addressed in discussion. No need for further testing. Remove the commented code and create pull request.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 04/Sep/18 5:31 PM

I made those changes, rebased onto upstream master, and created a pull request.

Show

Ivory Blakley (Inactive) added a comment - 04/Sep/18 5:31 PM I made those changes, rebased onto upstream master, and created a pull request.

Ivory Blakley (Inactive) made changes - 04/Sep/18 5:31 PM

Assignee

Ivory Blakley [ ieclabau ]

Ann Loraine [ aloraine ]

Ivory Blakley (Inactive) made changes - 04/Sep/18 5:31 PM

Status

Open [ 1 ]

Pull Request Submitted [ 10101 ]

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 05/Sep/18 9:39 AM

ann.loraine commented on pull request #623: ~~IGBF-1380~~ Support VCF4.2

How is this more efficient? Does it reduce:

processing time (less work per line of data)
memory usage (how much?)
It seems to me that the main value of this change is it reduces complexity of the code, making it easier to understand and maintain.

_________________

You are right. "more efficient" was not the right phrase. "simpler" is better.
I reduced the comments that I added for the INFO and FORMAT sub-class definitions in VCF.java.

Show

Ivory Blakley (Inactive) added a comment - 05/Sep/18 9:39 AM ann.loraine commented on pull request #623: IGBF-1380 Support VCF4.2 How is this more efficient? Does it reduce: processing time (less work per line of data) memory usage (how much?) It seems to me that the main value of this change is it reduces complexity of the code, making it easier to understand and maintain. _________________ You are right. "more efficient" was not the right phrase. "simpler" is better. I reduced the comments that I added for the INFO and FORMAT sub-class definitions in VCF.java.

Ann Loraine made changes - 06/Sep/18 11:37 AM

Status

Pull Request Submitted [ 10101 ]

Reviewing Pull Request [ 10303 ]

Ann Loraine made changes - 06/Sep/18 11:37 AM

Status

Reviewing Pull Request [ 10303 ]

Needs Testing [ 10002 ]

Ann Loraine made changes - 06/Sep/18 11:38 AM

Assignee

Ann Loraine [ aloraine ]

Hide

Permalink

Ann Loraine added a comment - 06/Sep/18 4:08 PM

Merged into master

Show

Ann Loraine added a comment - 06/Sep/18 4:08 PM Merged into master

Srishti Tiwari (Inactive) made changes - 06/Sep/18 4:29 PM

Assignee

Srishti Tiwari [ stiwari8 ]

Srishti Tiwari (Inactive) made changes - 10/Sep/18 9:45 AM

Status

Needs Testing [ 10002 ]

Testing In Progress [ 10003 ]

Hide

Permalink

Mason Meyer (Inactive) added a comment - 11/Sep/18 9:48 AM

My testing verifies that all VCF versions are now compatible with IGB and load as expected. There was no exception noted in the console during my testing and all other file formats are loading as expected, so this story will now be closed.

Show

Mason Meyer (Inactive) added a comment - 11/Sep/18 9:48 AM My testing verifies that all VCF versions are now compatible with IGB and load as expected. There was no exception noted in the console during my testing and all other file formats are loading as expected, so this story will now be closed.

Mason Meyer (Inactive) made changes - 11/Sep/18 9:48 AM

Resolution		Done [ 10000 ]
Status	Testing In Progress [ 10003 ]	Closed [ 6 ]

Ann Loraine made changes - 27/Aug/19 4:59 PM

Workflow

Loraine Lab Workflow [ 18082 ]

Fall 2019 Workflow Update [ 19932 ]

Ann Loraine made changes - 16/Oct/19 5:49 PM

Workflow

Fall 2019 Workflow Update [ 19932 ]

Revised Fall 2019 Workflow Update [ 22052 ]

Nowlan Freese made changes - 28/Oct/20 5:09 PM

Link

This issue relates to IGBF-930 [ IGBF-930 ]

Support newer VCF - User request

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates