Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4239

Make a "only show unique alignments" IGB track filter App for alignments tracks

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4

      Description

      Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, strand affiliation, etc. as other alignments in the same track.

      Sometimes, it is helpful to only see the alignments that are different from each other.

      For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other - that is, unique.

      One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

      The hash key could look like:

      • chr:strand:start:CIGAR

      as an example:

      • chr1:+:66666:50M1500N20M

      How this would work from a user's perspective:

      When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

      When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.

      I bet the hashtable will be surprisingly small.

        Attachments

          Activity

          ann.loraine Ann Loraine created issue -
          ann.loraine Ann Loraine made changes -
          Field Original Value New Value
          Epic Link IGBF-1908 [ 17998 ]
          ann.loraine Ann Loraine made changes -
          Summary Make show unique only IGB track filter for alignments tracks Make a show unique only IGB track filter for alignments tracks
          ann.loraine Ann Loraine made changes -
          Summary Make a show unique only IGB track filter for alignments tracks Make a "only show unique alignments" IGB track filter App for alignments tracks
          ann.loraine Ann Loraine made changes -
          Comment [ I think the "Score" menu item [~karthik] saw may be coming from the core IGB codebase, not the App. ]
          ann.loraine Ann Loraine made changes -
          Description Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, etc. as other alignments in the same track.

          Sometimes, it is helpful to *only* see the alignments that are different from each other.

          For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other, or unique.

          One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

          The hash key could look like:

          * chr:strand:start:CIGAR

          as an example:

          * chr1:+:66666:50M1500N20M

          How this would work from a user's perspective:

          When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

          When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.
          Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, etc. as other alignments in the same track.

          Sometimes, it is helpful to *only* see the alignments that are different from each other.

          For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other - that is, unique.

          One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

          The hash key could look like:

          * chr:strand:start:CIGAR

          as an example:

          * chr1:+:66666:50M1500N20M

          How this would work from a user's perspective:

          When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

          When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.

          I bet the hashtable will be surprisingly small.
          ann.loraine Ann Loraine made changes -
          Description Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, etc. as other alignments in the same track.

          Sometimes, it is helpful to *only* see the alignments that are different from each other.

          For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other - that is, unique.

          One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

          The hash key could look like:

          * chr:strand:start:CIGAR

          as an example:

          * chr1:+:66666:50M1500N20M

          How this would work from a user's perspective:

          When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

          When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.

          I bet the hashtable will be surprisingly small.
          Many of the alignments we see in BAM tracks are redundant, meaning: they show the same exact pattern of blocks, gaps, insertions, strand affiliation, etc. as other alignments in the same track.

          Sometimes, it is helpful to *only* see the alignments that are different from each other.

          For this task, create a new track filter for alignment tracks that only shows alignments that are different from each other - that is, unique.

          One possible way to create this filter would be to use a hashtable, with hash keys constructed from a combination of the chromosome name, strand, start position, and CIGAR string for each alignment.

          The hash key could look like:

          * chr:strand:start:CIGAR

          as an example:

          * chr1:+:66666:50M1500N20M

          How this would work from a user's perspective:

          When a filter is started or added to a track, it creates some kind of hashtable data structure, linked to that particular track somehow.

          When the user click "Load Data," the data get loaded into memory, and the filter creates a hash key for each new alignment that is read. If no value exists in the filter's hashtable for that key, then the alignment gets shown, and the hashtable stores a value to signal that this newly encountered key exists in the data.

          I bet the hashtable will be surprisingly small.
          nfreese Nowlan Freese made changes -
          Sprint Summer 1 [ 218 ] Summer 1, Summer 2 [ 218, 219 ]
          nfreese Nowlan Freese made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Link This issue relates to IGBF-4214 [ IGBF-4214 ]
          karthik Karthik Raveendran made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          karthik Karthik Raveendran made changes -
          Status In Progress [ 3 ] To-Do [ 10305 ]
          karthik Karthik Raveendran made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          karthik Karthik Raveendran made changes -
          Status In Progress [ 3 ] To-Do [ 10305 ]
          karthik Karthik Raveendran made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          karthik Karthik Raveendran made changes -
          Status In Progress [ 3 ] To-Do [ 10305 ]
          karthik Karthik Raveendran made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 1, Summer 2 [ 218, 219 ] Summer 1, Summer 2, Summer 3 [ 218, 219, 220 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          Show
          karthik Karthik Raveendran added a comment - Only Show Unique Alignments repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/src/main/
          Hide
          karthik Karthik Raveendran added a comment - - edited

          App is available in download section of the repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/downloads/

          There is an existing issue with this version of the app. Unique id that are generated is saved to the app and when the user clicks Load Data while the filter is active, the unique id reads disappears. I wanted someone to check the unique id logic while I fix this issue.

          Note: A quick hack to fix this for testing is to simply Load Data -> Remove Filter -> Add filter again

          Show
          karthik Karthik Raveendran added a comment - - edited App is available in download section of the repo: https://bitbucket.org/KarthikRavee91/only_show_unique_alignments_filter/downloads/ There is an existing issue with this version of the app. Unique id that are generated is saved to the app and when the user clicks Load Data while the filter is active, the unique id reads disappears. I wanted someone to check the unique id logic while I fix this issue. Note: A quick hack to fix this for testing is to simply Load Data -> Remove Filter -> Add filter again
          karthik Karthik Raveendran made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          karthik Karthik Raveendran made changes -
          Assignee Karthik Raveendran [ karthik ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 1, Summer 2, Summer 3 [ 218, 219, 220 ] Summer 1, Summer 2, Summer 3, Summer 4 [ 218, 219, 220, 221 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Assignee Ann Loraine [ aloraine ]
          Hide
          karthik Karthik Raveendran added a comment - - edited

          New commit had been push with improvements for Load Data and Load Sequence workflow. See commit

          When the user selects the filter for the alignments that is already loaded, then scrolls to another gene and click on Load Data, the previously loaded and filtered alignments should not disappear. Similarly, alignments should not disappear if the user loads the sequence.

          Show
          karthik Karthik Raveendran added a comment - - edited New commit had been push with improvements for Load Data and Load Sequence workflow. See commit When the user selects the filter for the alignments that is already loaded, then scrolls to another gene and click on Load Data, the previously loaded and filtered alignments should not disappear. Similarly, alignments should not disappear if the user loads the sequence.
          pkulzer Paige Kulzer made changes -
          Assignee Ann Loraine [ aloraine ] Paige Kulzer [ pkulzer ]
          pkulzer Paige Kulzer made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          pkulzer Paige Kulzer added a comment -

          Fetched Karthik's new commit and tested locally on my Mac using data from the smoke testing quickload (https://wiki.bioviz.org/confluence/display/ITD/File+Formats). This new commit has fixed the issue that was being observed previously where filtered alignments were disappearing with a second click of the Load Data and/or Load Sequence buttons.

          However, I noticed during testing that alignments that start and stop at the same position were being collapsed even if they had nucleotide differences. I believe this is due to the way that the filter uses the cigar string to compare alignments, and ultimately this is a poor dataset to have been testing with because the cigar strings do not appear to be properly formatted. I'm not sure if more "real-world" data will have better cigar strings or not, so this might be something to look into as part of a separate ticket.

          Additionally, I found an edge-case scenario that breaks the filter. Adding multiple of the same filter to a track brings back the issue of reads disappearing after clicking Load Data/Load Sequence. This is the case, too, if one filter is added to a single strand of a dataset and then the same filter is added to the track once it's combined.

          Overall, this filter app is working really well and the scope of this ticket has been completed – recommending PR.

          Show
          pkulzer Paige Kulzer added a comment - Fetched Karthik's new commit and tested locally on my Mac using data from the smoke testing quickload ( https://wiki.bioviz.org/confluence/display/ITD/File+Formats ). This new commit has fixed the issue that was being observed previously where filtered alignments were disappearing with a second click of the Load Data and/or Load Sequence buttons. However, I noticed during testing that alignments that start and stop at the same position were being collapsed even if they had nucleotide differences. I believe this is due to the way that the filter uses the cigar string to compare alignments, and ultimately this is a poor dataset to have been testing with because the cigar strings do not appear to be properly formatted. I'm not sure if more "real-world" data will have better cigar strings or not, so this might be something to look into as part of a separate ticket. Additionally, I found an edge-case scenario that breaks the filter. Adding multiple of the same filter to a track brings back the issue of reads disappearing after clicking Load Data/Load Sequence. This is the case, too, if one filter is added to a single strand of a dataset and then the same filter is added to the track once it's combined. Overall, this filter app is working really well and the scope of this ticket has been completed – recommending PR.
          pkulzer Paige Kulzer made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          pkulzer Paige Kulzer made changes -
          Assignee Paige Kulzer [ pkulzer ] Karthik Raveendran [ karthik ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          I added comments to the commit, with suggestions for a way you could potentially implement the App without requiring a change to the IGB filters API.

          I'm worried that making a change to the API could break existing Apps or cause other unforeseen problems. Also, it means we would not be able to use the App in IGB 10.1.0, the released version.

          Show
          ann.loraine Ann Loraine added a comment - - edited I added comments to the commit, with suggestions for a way you could potentially implement the App without requiring a change to the IGB filters API. I'm worried that making a change to the API could break existing Apps or cause other unforeseen problems. Also, it means we would not be able to use the App in IGB 10.1.0, the released version.
          ann.loraine Ann Loraine made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          ann.loraine Ann Loraine made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          ann.loraine Ann Loraine made changes -
          Status Reviewing Pull Request [ 10303 ] To-Do [ 10305 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 1, Summer 2, Summer 3, Summer 4 [ 218, 219, 220, 221 ] Summer 1, Summer 2, Summer 3, Summer 4, Summer 5 [ 218, 219, 220, 221, 222 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          karthik Karthik Raveendran made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 1, Summer 2, Summer 3, Summer 4, Summer 5 [ 218, 219, 220, 221, 222 ] Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6 [ 218, 219, 220, 221, 222, 223 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Sprint Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6 [ 218, 219, 220, 221, 222, 223 ] Testing 3 : 19 Nov - 29 Nov 2, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6 [ 83, 218, 219, 220, 221, 222, 223 ]
          karthik Karthik Raveendran made changes -
          Status In Progress [ 3 ] To-Do [ 10305 ]
          ann.loraine Ann Loraine made changes -
          Sprint Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6 [ 83, 218, 219, 220, 221, 222, 223 ] Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2 [ 83, 218, 219, 220, 221, 222, 223, 225 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Sprint Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2 [ 83, 218, 219, 220, 221, 222, 223, 225 ] Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3 [ 83, 218, 219, 220, 221, 222, 223, 225, 226 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Sprint Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3 [ 83, 218, 219, 220, 221, 222, 223, 225, 226 ] Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4 [ 83, 218, 219, 220, 221, 222, 223, 225, 226, 227 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          ann.loraine Ann Loraine made changes -
          Sprint Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4 [ 83, 218, 219, 220, 221, 222, 223, 225, 226, 227 ] Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4, Fall 5 [ 83, 218, 219, 220, 221, 222, 223, 225, 226, 227, 228 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4, Fall 5 [ 83, 218, 219, 220, 221, 222, 223, 225, 226, 227, 228 ] Fall 1, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5, Summer 6, Fall 2, Fall 3, Fall 4 [ 83, 218, 219, 220, 221, 222, 223, 225, 226, 227 ]

            People

            • Assignee:
              karthik Karthik Raveendran
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: