Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:
-
Story Points:4
-
Sprint:B - Summer 2018
Description
SAM specification allows for read alignments to be represented where the ends are "soft-clipped", meaning: they are not reported in the alignment.
An IGB user noticed that older versions of IGB omitted these bases from the view. Instead, they should be shown, with mis-matches. This may have been fixed, but we need to check.
If IGB shows the soft-clipped bases, users can more easily detect re-arrangements and other issues critical to diagnosing genomic defects in clinical samples.
To start, make sure the above description of "soft-clipping" is correct.
Attachments
Issue Links
- relates to
-
IGBF-1291 Show soft-clipped bases in BAM files
- Closed
Some helpful documentation...
Understanding CIGAR: https://jef.works/blog/2017/03/28/CIGAR-strings-for-dummies/
SAM documentation pdf: https://samtools.github.io/hts-specs/SAMv1.pdf
CIGAR explanation in SAM documentation (page 6):
CIGAR: CIGAR string. The CIGAR operations are given in the following table (set ‘*’ if unavailable):
Op __ BAM __ Description __ Consumes query __ Consumes reference
M 0 alignment match (can be a sequence match or mismatch) yes yes
I 1 insertion to the reference yes no
D 2 deletion from the reference no yes
N 3 skipped region from the reference no yes
S 4 soft clipping (clipped sequences present in SEQ) yes no
H 5 hard clipping (clipped sequences NOT present in SEQ) no no
P 6 padding (silent deletion from padded reference) no no
= 7 sequence match yes yes
X 8 sequence mismatch yes yes
• “Consumes query” and “consumes reference” indicate whether the CIGAR operation causes the
alignment to step along the query sequence and the reference sequence respectively.
• H can only be present as the first and/or last operation.
• S may only have H operations between them and the ends of the CIGAR string.
• For mRNA-to-genome alignment, an N operation represents an intron. For other types of alignments,
the interpretation of N is not defined.
• Sum of lengths of the M/I/S/=/X operations shall equal the length of SEQ.