Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1096

Searching two files only returns one hit

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Story Points:
      3
    • Sprint:
      Winter 2018 Sprint 3, Spring 2019 Sprint 1

      Description

      During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

      Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file. The SNP is present in both files, but IGB only finds it in one of the files.

      See Dr. Freese for files.

      Computer used - Mac. IGB version 8.5.2

      Task:

      • Determine if you can reproduce the error using files provided by Dr. Freese
      • Investigate & report why this error is occurring

        Attachments

        1. AfterFix.jpg
          AfterFix.jpg
          288 kB
        2. diseaseDad.bed.gz
          0.8 kB
        3. diseaseDad.bed.gz.tbi
          2 kB
        4. diseaseMom.bed.gz
          0.8 kB
        5. diseaseMom.bed.gz.tbi
          2 kB
        6. GFF3_example_1_EDITED.gff3
          0.8 kB
        7. missingSearch.png
          missingSearch.png
          206 kB

          Issue Links

            Activity

            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-2061 [ IGBF-2061 ]
            Hide
            prutha Prutha Kulkarni (Inactive) added a comment -

            The rebased branch code can be found at:
            https://bitbucket.org/pkulka10/igb_prutha/src/IGBF-2032/

            Show
            prutha Prutha Kulkarni (Inactive) added a comment - The rebased branch code can be found at: https://bitbucket.org/pkulka10/igb_prutha/src/IGBF-2032/
            ann.loraine Ann Loraine made changes -
            Labels Beginner Advanced
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 19764 ] Revised Fall 2019 Workflow Update [ 21883 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2032 [ IGBF-2032 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            For next step:

            • squash all changes into one commit
            • rebase onto the latest master branch, resolving conflicts as needed
            • post a link to the new branch here
            Show
            ann.loraine Ann Loraine added a comment - For next step: squash all changes into one commit rebase onto the latest master branch, resolving conflicts as needed post a link to the new branch here
            ann.loraine Ann Loraine made changes -
            Comment [ We will review current architecture & discuss re-factoring, using this branch as an extremely useful proof-of-concept and guide. ]
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 16980 ] Fall 2019 Workflow Update [ 19764 ]
            ann.loraine Ann Loraine made changes -
            Assignee Sai Charan Reddy Vallapureddy [ svallapu ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Reviewing [ 10301 ] Closed [ 6 ]
            nfreese Nowlan Freese made changes -
            Assignee Pranav Sanjay Tambvekar [ ptambvek ]
            Hide
            ann.loraine Ann Loraine added a comment -

            A lot of files have been edited on this branch. This looks like a major re-factoring of the searching functionality in IGB. This new code may be superior and we may be able to safely remove the old code. If it is feasible, I would prefer to remove the old code so that we don't leave two parallel systems in place for future developers to wonder about!
            I'm not sure what the next step would be.
            I would like to get a code walk-through showing the path of execution.
            Before that, everyone needs to know how and when keyword (property-based) searching can happen in the user interface.
            Here is documentation from the User's Guide:
            https://wiki.transvar.org/display/igbman/Advanced+Search

            Show
            ann.loraine Ann Loraine added a comment - A lot of files have been edited on this branch. This looks like a major re-factoring of the searching functionality in IGB. This new code may be superior and we may be able to safely remove the old code. If it is feasible, I would prefer to remove the old code so that we don't leave two parallel systems in place for future developers to wonder about! I'm not sure what the next step would be. I would like to get a code walk-through showing the path of execution. Before that, everyone needs to know how and when keyword (property-based) searching can happen in the user interface. Here is documentation from the User's Guide: https://wiki.transvar.org/display/igbman/Advanced+Search
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Existing search methods might be used by some other different parts of code. I don't want to disturb other functionalities. That is the reason, I have created new methods for advanced search.

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Existing search methods might be used by some other different parts of code. I don't want to disturb other functionalities. That is the reason, I have created new methods for advanced search.
            ptambvek Pranav Sanjay Tambvekar (Inactive) made changes -
            Assignee Pranav Sanjay Tambvekar [ ptambvek ]
            ptambvek Pranav Sanjay Tambvekar (Inactive) made changes -
            Status Needs 1st Level Review [ 10005 ] Reviewing [ 10301 ]
            Hide
            ptambvek Pranav Sanjay Tambvekar (Inactive) added a comment -

            Tested, the fix has worked.

            Dr. [~aloraine], can you please guide if we should modify the existing methods to use ArrayList instead of using HashSets, or add new methods, keeping the original ones as they are, as done by Sai Charan Reddy Vallapureddy on his fork -https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New ?

            Show
            ptambvek Pranav Sanjay Tambvekar (Inactive) added a comment - Tested, the fix has worked. Dr. [~aloraine] , can you please guide if we should modify the existing methods to use ArrayList instead of using HashSets, or add new methods, keeping the original ones as they are, as done by Sai Charan Reddy Vallapureddy on his fork - https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New ?
            svallapu Sai Charan Reddy Vallapureddy (Inactive) made changes -
            Assignee Sai Charan Reddy Vallapureddy [ svallapu ]
            svallapu Sai Charan Reddy Vallapureddy (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Code Changed (Branch) : https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New

            Issue: Searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files. Two files contain the similar data.

            Before: Search mechanism is using HashSet to store the search results. HashSet collection will eliminate the duplicate objects. Hence it gives single output for the two similar files.

            Solution: Advanced Search should return two files when we search for the SNP ID. This SNP ID is present in both the files. Changed the search mechanism to use ArrayList instead of HashSet to get the results. Lot of files are changed because changed classes are extended/ implemented by many other classes.

            Tracking the Advanced Search Mechanism: (In all the files, search the methods with ArrayList input variable)
            -->SearchModeIDOrProps.java (method: findLocalSyms, call: SearchUtils.findLocalSyms)
            -->SearchUtils.java (method: findLocalSyms, call: genomeVersion.SearchProperties) (Set is changed to List)
            -->GenomeVersion.java(method: searchProperties, call: seq.searchProperties)
            -->Bioseq.java(method: searchProperties, call : getAnnotation.searchProperties)
            -->RootSeqSymmetry.java (method: searchProperties)
            -->TypeContainerAnnot.java(method: searchProperties, implemented search using ArrayList)

            Testing Steps:
            1. Open IGB
            2. Download diseaseMom.bed.gz, diseaseDad.bed.gz
            3. Unzip them
            4. Drag and drop them to IGB
            5. Change from manual to genome for both the files.
            6. Go to AdvancedSearch and search for "rs1122608" keyword.
            7. It should give results from both the files instead of one.

            TestResults:
            Please find in the attachments.(AfterFix.jpg)

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Code Changed (Branch) : https://bitbucket.org/svallapu/charan_igb/branch/IGBF-1096-New Issue: Searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files. Two files contain the similar data. Before: Search mechanism is using HashSet to store the search results. HashSet collection will eliminate the duplicate objects. Hence it gives single output for the two similar files. Solution: Advanced Search should return two files when we search for the SNP ID. This SNP ID is present in both the files. Changed the search mechanism to use ArrayList instead of HashSet to get the results. Lot of files are changed because changed classes are extended/ implemented by many other classes. Tracking the Advanced Search Mechanism: (In all the files, search the methods with ArrayList input variable) -->SearchModeIDOrProps.java (method: findLocalSyms, call: SearchUtils.findLocalSyms) -->SearchUtils.java (method: findLocalSyms, call: genomeVersion.SearchProperties) (Set is changed to List) -->GenomeVersion.java(method: searchProperties, call: seq.searchProperties) -->Bioseq.java(method: searchProperties, call : getAnnotation .searchProperties) -->RootSeqSymmetry.java (method: searchProperties) -->TypeContainerAnnot.java(method: searchProperties, implemented search using ArrayList) Testing Steps: 1. Open IGB 2. Download diseaseMom.bed.gz, diseaseDad.bed.gz 3. Unzip them 4. Drag and drop them to IGB 5. Change from manual to genome for both the files. 6. Go to AdvancedSearch and search for "rs1122608" keyword. 7. It should give results from both the files instead of one. TestResults: Please find in the attachments.(AfterFix.jpg)
            svallapu Sai Charan Reddy Vallapureddy (Inactive) made changes -
            Attachment AfterFix.jpg [ 14252 ]
            Hide
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment -

            Test Results.

            Show
            svallapu Sai Charan Reddy Vallapureddy (Inactive) added a comment - Test Results.
            nfreese Nowlan Freese made changes -
            Attachment GFF3_example_1_EDITED.gff3 [ 14247 ]
            nfreese Nowlan Freese made changes -
            Description During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file. The SNP is present in both files, but IGB only finds it in one of the files.

            See Dr. Freese for files.

            Computer used - Mac. IGB version 8.5.2

            Task:

            * Determine if you can reproduce the error using files provided by Dr. Freese
            * Investigate & report why this error is occurring
            * If a fix is obvious & simple, go ahead and do it.
            During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file. The SNP is present in both files, but IGB only finds it in one of the files.

            See Dr. Freese for files.

            Computer used - Mac. IGB version 8.5.2

            Task:

            * Determine if you can reproduce the error using files provided by Dr. Freese
            * Investigate & report why this error is occurring
            nfreese Nowlan Freese made changes -
            Story Points 1 3
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Winter 2018 Sprint 3 [ 58 ] Winter 2018 Sprint 3, Spring 2019 Sprint 1 [ 58, 59 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            svallapu Sai Charan Reddy Vallapureddy (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            svallapu Sai Charan Reddy Vallapureddy (Inactive) made changes -
            Assignee Sai Charan Reddy Vallapureddy [ svallapu ]
            nfreese Nowlan Freese made changes -
            Attachment missingSearch.png [ 14223 ]
            nfreese Nowlan Freese made changes -
            Attachment Screen Shot 2016-02-18 at 5.20.44 PM.png [ 13275 ]
            nfreese Nowlan Freese made changes -
            Attachment diseaseDad.bed.gz [ 14219 ]
            Attachment diseaseDad.bed.gz.tbi [ 14220 ]
            Attachment diseaseMom.bed.gz [ 14221 ]
            Attachment diseaseMom.bed.gz.tbi [ 14222 ]
            nfreese Nowlan Freese made changes -
            Attachment mysteryBabyDad.bed.gz [ 14208 ]
            nfreese Nowlan Freese made changes -
            Attachment mysteryBabyMom.bed.gz [ 14209 ]
            nfreese Nowlan Freese made changes -
            Attachment mysteryBabyDad.bed.gz [ 14208 ]
            Attachment mysteryBabyMom.bed.gz [ 14209 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Description During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file.

            See Dr. Freese for files.

            Computer used - Mac. IGB version 8.5.2

            Task:

            * Determine if you can reproduce the error using files provided by Dr. Freese
            * Investigate & report why this error is occurring
            * If a fix is obvious & simple, go ahead and do it.
            During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file. The SNP is present in both files, but IGB only finds it in one of the files.

            See Dr. Freese for files.

            Computer used - Mac. IGB version 8.5.2

            Task:

            * Determine if you can reproduce the error using files provided by Dr. Freese
            * Investigate & report why this error is occurring
            * If a fix is obvious & simple, go ahead and do it.
            ann.loraine Ann Loraine made changes -
            Story Points 1
            Sprint Winter 2018 Sprint 3 [ 58 ]
            Labels Beginner
            ann.loraine Ann Loraine made changes -
            Description During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file.

            I have only been able to replicate this with the files we used for the outreach event, which were gzipped and tabix indexed. See Nowlan for files.

            Computer used - Mac. IGB version 8.5.2
            During the High School outreach event users noticed that searching two BED files for a single SNP ID in the advanced search tab would return a hit for only one of the files.

            Files are bed detail format. The search is for the SNP id, which is in both the name (4th) and id (13th) columns in the file.

            See Dr. Freese for files.

            Computer used - Mac. IGB version 8.5.2

            Task:

            * Determine if you can reproduce the error using files provided by Dr. Freese
            * Investigate & report why this error is occurring
            * If a fix is obvious & simple, go ahead and do it.
            nfreese Nowlan Freese made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Epic Link IGBF-497 [ 15559 ]
            nfreese Nowlan Freese made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Field Original Value New Value
            Epic Link IGBF-497 [ 15559 ]
            nfreese Nowlan Freese created issue -

              People

              • Assignee:
                svallapu Sai Charan Reddy Vallapureddy (Inactive)
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: