Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4239

Make a "only show unique alignments" IGB track filter App for alignments tracks

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4

      Description

      Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, strand affiliation, etc. as other alignments in the same track.

      Sometimes, it is helpful to only see the alignments that are different from each other.

      For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other - that is, unique.

      One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

      The hash key could look like:

      • chr:strand:start:CIGAR

      as an example:

      • chr1:+:66666:50M1500N20M

      How this would work from a user's perspective:

      When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

      When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.

      I bet the hashtable will be surprisingly small.

        Attachments

          Activity

          Hide
          ann.loraine Ann Loraine added a comment - - edited

          I added comments to the commit, with suggestions for a way you could potentially implement the App without requiring a change to the IGB filters API.

          I'm worried that making a change to the API could break existing Apps or cause other unforeseen problems. Also, it means we would not be able to use the App in IGB 10.1.0, the released version.

          Show
          ann.loraine Ann Loraine added a comment - - edited I added comments to the commit, with suggestions for a way you could potentially implement the App without requiring a change to the IGB filters API. I'm worried that making a change to the API could break existing Apps or cause other unforeseen problems. Also, it means we would not be able to use the App in IGB 10.1.0, the released version.
          Hide
          pkulzer Paige Kulzer added a comment -

          Fetched Karthik's new commit and tested locally on my Mac using data from the smoke testing quickload (https://wiki.bioviz.org/confluence/display/ITD/File+Formats). This new commit has fixed the issue that was being observed previously where filtered alignments were disappearing with a second click of the Load Data and/or Load Sequence buttons.

          However, I noticed during testing that alignments that start and stop at the same position were being collapsed even if they had nucleotide differences. I believe this is due to the way that the filter uses the cigar string to compare alignments, and ultimately this is a poor dataset to have been testing with because the cigar strings do not appear to be properly formatted. I'm not sure if more "real-world" data will have better cigar strings or not, so this might be something to look into as part of a separate ticket.

          Additionally, I found an edge-case scenario that breaks the filter. Adding multiple of the same filter to a track brings back the issue of reads disappearing after clicking Load Data/Load Sequence. This is the case, too, if one filter is added to a single strand of a dataset and then the same filter is added to the track once it's combined.

          Overall, this filter app is working really well and the scope of this ticket has been completed – recommending PR.

          Show
          pkulzer Paige Kulzer added a comment - Fetched Karthik's new commit and tested locally on my Mac using data from the smoke testing quickload ( https://wiki.bioviz.org/confluence/display/ITD/File+Formats ). This new commit has fixed the issue that was being observed previously where filtered alignments were disappearing with a second click of the Load Data and/or Load Sequence buttons. However, I noticed during testing that alignments that start and stop at the same position were being collapsed even if they had nucleotide differences. I believe this is due to the way that the filter uses the cigar string to compare alignments, and ultimately this is a poor dataset to have been testing with because the cigar strings do not appear to be properly formatted. I'm not sure if more "real-world" data will have better cigar strings or not, so this might be something to look into as part of a separate ticket. Additionally, I found an edge-case scenario that breaks the filter. Adding multiple of the same filter to a track brings back the issue of reads disappearing after clicking Load Data/Load Sequence. This is the case, too, if one filter is added to a single strand of a dataset and then the same filter is added to the track once it's combined. Overall, this filter app is working really well and the scope of this ticket has been completed – recommending PR.
          Hide
          karthik Karthik Raveendran added a comment - - edited

          New commit had been push with improvements for Load Data and Load Sequence workflow. See commit

          When the user selects the filter for the alignments that is already loaded, then scrolls to another gene and click on Load Data, the previously loaded and filtered alignments should not disappear. Similarly, alignments should not disappear if the user loads the sequence.

          Show
          karthik Karthik Raveendran added a comment - - edited New commit had been push with improvements for Load Data and Load Sequence workflow. See commit When the user selects the filter for the alignments that is already loaded, then scrolls to another gene and click on Load Data, the previously loaded and filtered alignments should not disappear. Similarly, alignments should not disappear if the user loads the sequence.
          Hide
          karthik Karthik Raveendran added a comment - - edited

          App is available in download section of the repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/downloads/

          There is an existing issue with this version of the app. Unique id that are generated is saved to the app and when the user clicks Load Data while the filter is active, the unique id reads disappears. I wanted someone to check the unique id logic while I fix this issue.

          Note: A quick hack to fix this for testing is to simply Load Data -> Remove Filter -> Add filter again

          Show
          karthik Karthik Raveendran added a comment - - edited App is available in download section of the repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/downloads/ There is an existing issue with this version of the app. Unique id that are generated is saved to the app and when the user clicks Load Data while the filter is active, the unique id reads disappears. I wanted someone to check the unique id logic while I fix this issue. Note: A quick hack to fix this for testing is to simply Load Data -> Remove Filter -> Add filter again
          Show
          karthik Karthik Raveendran added a comment - Only Show Unique Alignments repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/src/main/

            People

            • Assignee:
              karthik Karthik Raveendran
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: