Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Spring 1, Spring 2
Description
Task: Write a high-level overview of how IGB is parsing and displaying a VCF file. This will include naming key classes in IGB that are important for parsing and viewing the VCF file, as well as the overall flow.
There are two types of VCF files we want to look at for this task: one with population-level information ("ALL.apol1.sample.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf" from the igvData.zip file) and one with individual-specific information ("NG1LRQNESI.hard-filtered.vcf.gz"). These files can be found on Google Drive in the VCF project folder: https://drive.google.com/drive/folders/1PTa3rcCp59BRUcd2mB8BfemK2d3Kn3n5?usp=drive_link
IGB VCF parsing flow
Key classes used:
Classes used for parsing and visualisation are all custom, there are no inbuilt classes used
1. VCF
2. SeqSymmetry
3. BAMSym
4. GraphIntervalSym
No in-built libraries used for parsing, it is handled in manual way.
VCF (parsing and processing)
1. Loading the file:
2. Reading and extracting metadata:
3. Parsing data metadata and header
4. Parsing variant information
5. Data visualisation in IGB