IGB VCF parsing flow
Key classes used:
Classes used for parsing and visualisation are all custom, there are no inbuilt classes used
1. VCF
- Responsible for reading, parsing, storing variants from VCF file
- Uses manual string splitting and regex based parsing to extract metadata, variant information
2. SeqSymmetry
- Stores standard variant metadata
3. BAMSym
- Represents insertions, deletions, and other structural changes.
4. GraphIntervalSym
- Stores numeric data from the VCF file.
- Used for graphing info/format fields in IGB.
No in-built libraries used for parsing, it is handled in manual way.
VCF (parsing and processing)
1. Loading the file:
- The VCF file is opened as an InputStream for processing.
2. Reading and extracting metadata:
- inputstream is wrapped in bufferreader for better text reading, to process line by line
3. Parsing data metadata and header
- so if starts with "##", it extracts metadata INFO, FILTER, FORMAT and stores metadata in maps (infoMap, filterMap, formatMap).
- The header line (#CHROM POS ... FORMAT SAMPLE1 SAMPLE2 ...) is manually extracted and sample names are stored in array
- uses lot of regex patterns to extract metadata
4. Parsing variant information
- Convert POS from 1-based (vcf) to 0-based IGB
- Extracts information like info fields, sample-specific genotype
- Crate SeqSymmetry object for each variant.(location, REF, ALT, quality, genotype, etc.).
- Extracts genotypes manually from FORMAT fields
- Convert to BAMSym obj format for Structural Variants(insertion/deletion) which uses CIGAR format (e.g., M for match, I for insertion, D for deletion) and ensures correct alignment visualization.
5. Data visualisation in IGB
- Variants are displayed as annotations on genomic tracks.
- Numeric INFO fields (like depth, allele frequency) are plotted as graphs.
IGB VCF parsing flow
Key classes used:
Classes used for parsing and visualisation are all custom, there are no inbuilt classes used
1. VCF
2. SeqSymmetry
3. BAMSym
4. GraphIntervalSym
No in-built libraries used for parsing, it is handled in manual way.
VCF (parsing and processing)
1. Loading the file:
2. Reading and extracting metadata:
3. Parsing data metadata and header
4. Parsing variant information
5. Data visualisation in IGB