Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:4
-
Epic Link:
-
Sprint:Spring 3, Spring 5, Spring 6
Description
Situation: loaders depends on LineProcessor for parsing and loading genomic regions
Task: Investigate how we could eliminate the LineProcessor dependency completely, streamlining the parsing process and displaying region.
Attachments
Issue Links
Activity
| Field | Original Value | New Value |
|---|---|---|
| Epic Link | IGBF-3836 [ 23135 ] |
| Status | To-Do [ 10305 ] | In Progress [ 3 ] |
| Sprint | Spring 3, Spring 4 [ 212, 213 ] | Spring 3 [ 212 ] |
| Sprint | Spring 3 [ 212 ] | Spring 3, Spring 5 [ 212, 214 ] |
| Description |
Situation: VCF file is being read line by line manually
Task: Investigate how we could integrate htsjdk library to make use of in-built classes instead of manual processing of vcf file |
Situation: loaders depends on LineProcessor for parsing and loading genomic regions
Task: Investigate how we could eliminate the LineProcessor dependency completely, streamlining the parsing process and displaying region. |
| Attachment | vcf_exampleFiles.zip [ 18659 ] |
| Attachment | vcf_exampleFiles.zip [ 18659 ] |
| Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
| Attachment | vcf_sample_files.zip [ 18660 ] |
| Attachment | vcf_sample_files.zip [ 18660 ] |
| Comment | [ I took the two files that we had found earlier and subset the larger file to just APOL1. Note that one sample vcf is for the 2013 human genome and the other is for the 2009 human genome (see the respective readme.txt files). I also created multiple indexes for the files using tabix, bcftools, or igvtools. The files themselves are either raw or bgzipped. There are also two bcf files, but I could not get them to load in either IGB or IGV, so something seems wrong with them. ] |
| Sprint | Spring 3, Spring 5 [ 212, 214 ] | Spring 3, Spring 5, Spring 6 [ 212, 214, 215 ] |
| Rank | Ranked higher |
| Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
| Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
| Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
| Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
| Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
| Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
| Resolution | Done [ 10000 ] | |
| Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
| Summary | Investigate: Eliminate lineProcessor implementation in VCF loader in IGB | Eliminate lineProcessor implementation in VCF loader in IGB |
Here are some sample VCF files that you could possibly test with: https://bioinformaticstools.mayo.edu/research/vcf-miner-sample-vcfs/
Files with the ".gz" suffix will need to be gunzipped, for example:
I would recommend starting with the smallest files since these VCF files can get quite large and complicated!