Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1096

Searching two files only returns one hit

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Story Points:
      3
    • Sprint:
      Winter 2018 Sprint 3, Spring 2019 Sprint 1

      Description

      During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

      Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file. The SNP is present in both files, but IGB only finds it in one of the files.

      See Dr. Freese for files.

      Computer used - Mac. IGB version 8.5.2

      Task:

      • Determine if you can reproduce the error using files provided by Dr. Freese
      • Investigate & report why this error is occurring

        Attachments

        1. AfterFix.jpg
          AfterFix.jpg
          288 kB
        2. diseaseDad.bed.gz
          0.8 kB
        3. diseaseDad.bed.gz.tbi
          2 kB
        4. diseaseMom.bed.gz
          0.8 kB
        5. diseaseMom.bed.gz.tbi
          2 kB
        6. GFF3_example_1_EDITED.gff3
          0.8 kB
        7. missingSearch.png
          missingSearch.png
          206 kB

          Issue Links

            Activity

            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Test Results.

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Test Results.
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Code Changed (Branch) : https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New

            Issue: Searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files. Two files contain the similar data.

            Before: Search mechanism is using HashSet to store the search results. HashSet collection will eliminate the duplicate objects. Hence it gives single output for the two similar files.

            Solution: Advanced Search should return two files when we search for the SNP ID. This SNP ID is present in both the files. Changed the search mechanism to use ArrayList instead of HashSet to get the results. Lot of files are changed because changed classes are extended/ implemented by many other classes.

            Tracking the Advanced Search Mechanism: (In all the files, search the methods with ArrayList input variable)
            -->SearchModeIDOrProps.java (method: findLocalSyms, call: SearchUtils.findLocalSyms)
            -->SearchUtils.java (method: findLocalSyms, call: genomeVersion.SearchProperties) (Set is changed to List)
            -->GenomeVersion.java(method: searchProperties, call: seq.searchProperties)
            -->Bioseq.java(method: searchProperties, call : getAnnotation.searchProperties)
            -->RootSeqSymmetry.java (method: searchProperties)
            -->TypeContainerAnnot.java(method: searchProperties, implemented search using ArrayList)

            Testing Steps:
            1. Open IGB
            2. Download diseaseMom.bed.gz, diseaseDad.bed.gz
            3. Unzip them
            4. Drag and drop them to IGB
            5. Change from manual to genome for both the files.
            6. Go to AdvancedSearch and search for "rs1122608" keyword.
            7. It should give results from both the files instead of one.

            TestResults:
            Please find in the attachments.(AfterFix.jpg)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Code Changed (Branch) : https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New Issue: Searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files. Two files contain the similar data. Before: Search mechanism is using HashSet to store the search results. HashSet collection will eliminate the duplicate objects. Hence it gives single output for the two similar files. Solution: Advanced Search should return two files when we search for the SNP ID. This SNP ID is present in both the files. Changed the search mechanism to use ArrayList instead of HashSet to get the results. Lot of files are changed because changed classes are extended/ implemented by many other classes. Tracking the Advanced Search Mechanism: (In all the files, search the methods with ArrayList input variable) -->SearchModeIDOrProps.java (method: findLocalSyms, call: SearchUtils.findLocalSyms) -->SearchUtils.java (method: findLocalSyms, call: genomeVersion.SearchProperties) (Set is changed to List) -->GenomeVersion.java(method: searchProperties, call: seq.searchProperties) -->Bioseq.java(method: searchProperties, call : getAnnotation .searchProperties) -->RootSeqSymmetry.java (method: searchProperties) -->TypeContainerAnnot.java(method: searchProperties, implemented search using ArrayList) Testing Steps: 1. Open IGB 2. Download diseaseMom.bed.gz, diseaseDad.bed.gz 3. Unzip them 4. Drag and drop them to IGB 5. Change from manual to genome for both the files. 6. Go to AdvancedSearch and search for "rs1122608" keyword. 7. It should give results from both the files instead of one. TestResults: Please find in the attachments.(AfterFix.jpg)
            Hide
            ptambvek Pranav Sanjay Tambvekar (Inactive) added a comment -

            Tested, the fix has worked.

            Dr. [~aloraine], can you please guide if we should modify the existing methods to use ArrayList instead of using HashSets, or add new methods, keeping the original ones as they are, as done by Sai Charan Reddy Vallapureddy on his fork -https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New ?

            Show
            ptambvek Pranav Sanjay Tambvekar (Inactive) added a comment - Tested, the fix has worked. Dr. [~aloraine] , can you please guide if we should modify the existing methods to use ArrayList instead of using HashSets, or add new methods, keeping the original ones as they are, as done by Sai Charan Reddy Vallapureddy on his fork - https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New ?
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Existing search methods might be used by some other different parts of code. I don't want to disturb other functionalities. That is the reason, I have created new methods for advanced search.

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Existing search methods might be used by some other different parts of code. I don't want to disturb other functionalities. That is the reason, I have created new methods for advanced search.
            Hide
            ann.loraine Ann Loraine added a comment -

            A lot of files have been edited on this branch. This looks like a major re-factoring of the searching functionality in IGB. This new code may be superior and we may be able to safely remove the old code. If it is feasible, I would prefer to remove the old code so that we don't leave two parallel systems in place for future developers to wonder about!
            I'm not sure what the next step would be.
            I would like to get a code walk-through showing the path of execution.
            Before that, everyone needs to know how and when keyword (property-based) searching can happen in the user interface.
            Here is documentation from the User's Guide:
            https://wiki.transvar.org/display/igbman/Advanced+Search

            Show
            ann.loraine Ann Loraine added a comment - A lot of files have been edited on this branch. This looks like a major re-factoring of the searching functionality in IGB. This new code may be superior and we may be able to safely remove the old code. If it is feasible, I would prefer to remove the old code so that we don't leave two parallel systems in place for future developers to wonder about! I'm not sure what the next step would be. I would like to get a code walk-through showing the path of execution. Before that, everyone needs to know how and when keyword (property-based) searching can happen in the user interface. Here is documentation from the User's Guide: https://wiki.transvar.org/display/igbman/Advanced+Search
            Hide
            ann.loraine Ann Loraine added a comment -

            For next step:

            • squash all changes into one commit
            • rebase onto the latest master branch, resolving conflicts as needed
            • post a link to the new branch here
            Show
            ann.loraine Ann Loraine added a comment - For next step: squash all changes into one commit rebase onto the latest master branch, resolving conflicts as needed post a link to the new branch here
            Hide
            prutha Prutha Kulkarni (Inactive) added a comment -

            The rebased branch code can be found at:
            https://bitbucket.org/pkulka10/igb_prutha/src/IGBF-2032/

            Show
            prutha Prutha Kulkarni (Inactive) added a comment - The rebased branch code can be found at: https://bitbucket.org/pkulka10/igb_prutha/src/IGBF-2032/

              People

              • Assignee:
                svallapu Sai Charan Reddy Vallapureddy (Inactive)
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: