Details
-
Type:
New Feature
-
Status: To-Do (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:
Description
The mismatch graph operation counts the number of mis-matches in a bam file at each position. The mismatch pileup graph plots a gray total count of overlapping reads and draws a stacked mismatched graph with the different types of mis-matches color coded.
The pileup graph counts all mis-matches, regardless of what base the read has.
For example:
In region: Chr1:5,734,954-5,735,011
Using the file: http://lorainelab-quickload.scidas.org/rnaseq/A_thaliana_Jun_2009/SRP022162/Pollen.bam
(available through Data Access/ RNA-Seq/ Pollen/ Reads/ Pollen alignments)
Run the mismatch pileup graph and the mismatch graph track operations.
If we add the ability to view soft-clipped portions of reads, users could potentially look for regions where many reads have the same soft-clipped sequence, which would be evidence for novel splicing or genomic rearrangements (relative to the reference). *Users may want to scan the genome for areas where the soft-clipped mis-matches match up.* A simple mismatch graph will create peaks even if all of the mismatches are different, which will create a lot of extra (meaningless) peaks that the user would need to skim over. A modified version that tallies the mismatches for each type but only reports the most abundant one would be very useful in this case.
Using this output, users could set a threshold for the number of reads that have to support a *single alternative sequence*. They could also make a standard mismatch pileup and take the ratio (another available operation) and then use thresholding on that output to pick areas where there are at least X consecutive bases where at least 90% of mis-matches support a single alternative sequence.
There are already example apps that add a track operation.
See Merge Annotation Track Operator