Details

    • Type: Bug
    • Status: In Progress (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      -IGB live does produce the duplicated lines. (see * below)
      -The lines are not (strictly) ordered. (See **)

      Chr1 8227506 8227522 2.0
      Chr1 8227506 8227522 2.0 *
      Chr1 8227534 8227542 5.0
      Chr1 8227534 8227542 5.0
      Chr1 8227542 8227553 6.0
      Chr1 8227534 8227542 5.0
      Chr1 20998229 20998235 5.0
      Chr1 20998235 20998333 0.0
      Chr1 8227506 8227522 2.0 **

      For some downstream use cases, non-overlap and being sorted are required, or at least desired.

        Attachments

          Issue Links

            Activity

            ieclabau Ivory Blakley (Inactive) created issue -
            ieclabau Ivory Blakley (Inactive) made changes -
            Field Original Value New Value
            Link This issue relates to IGBF-1090 [ IGBF-1090 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Description
            -IGB live does produce the duplicated lines. (see * below)
            -The lines are not (strictly) ordered. (See **)

            Chr1 8227506 8227522 2.0
            Chr1 8227506 8227522 2.0 *
            Chr1 8227534 8227542 5.0
            Chr1 8227534 8227542 5.0
            Chr1 8227542 8227553 6.0
            Chr1 8227534 8227542 5.0
            Chr1 20998229 20998235 5.0
            Chr1 20998235 20998333 0.0
            Chr1 8227506 8227522 2.0 **

            For some downstream use cases, non-overlap and being sorted are required, or at least desired.
            ieclabau Ivory Blakley (Inactive) made changes -
            Rank Ranked higher
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Assignee Ann Loraine [ aloraine ] Jennifer Daly [ jdaly ]
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Sprint Spring 2017 [ 47 ]
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Rank Ranked lower
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Comment [ I made this script to find duplicates so I don't have to look through the file:

            #! /usr/bin/env python

            from __future__ import print_function
            log = open("output.txt", "w")

            def find_duplicates(file):
            with open(file) as f:
            seen = set()
            for line in f:
            line_lower = line.lower()
            if line_lower in seen:
            print(line, file = log)
            else:
            seen.add(line_lower)


            find_duplicates("test1.bedgraph")

            I tested it by creating a duplicate on purpose, it works.
            I tested 4 files from master branch, and 4 files from IGB live, none of them produced duplicates.

            @Ivory, is this issue a result of IGB saving multiple tiers, the issue we just fixed by re-writing exportfileaction? I know you said you were able to reproduce it in IGB live, can you test it again? ]
            Hide
            sneha Sneha Ramesh Watharkar (Inactive) added a comment -

            This issue occurs on graphs when a region is loaded, and then another region that contains genes that overlap with the first region is loaded. The overlapping genes are loaded each time a new region containing them is loaded into view. Annotations, .bed files, are saved correctly when overlapping regions are selected. This issue only exists with graphs.

            The first difference between the two methods of file saving exists in the exportFile method in the ExportFileAction class. This is the same class that was altered in issue 1090, but the issue existed before those changes were implemented.

            In exportFile, annotations and graphs follow two different if blocks to save. The annotations if block calls a function 'collectSyms' which then adds the syms and their children, in order, to a rootsym which is a List. The size of this list is in the 60s when working with Arabidopsis Thaliana.

            The if block for graphs simply adds all of the syms into an array list with an .addAll method. These syms do not have children.

            Attached is a .py to help determine if a file contains duplicate rows.

            Show
            sneha Sneha Ramesh Watharkar (Inactive) added a comment - This issue occurs on graphs when a region is loaded, and then another region that contains genes that overlap with the first region is loaded. The overlapping genes are loaded each time a new region containing them is loaded into view. Annotations, .bed files, are saved correctly when overlapping regions are selected. This issue only exists with graphs. The first difference between the two methods of file saving exists in the exportFile method in the ExportFileAction class. This is the same class that was altered in issue 1090, but the issue existed before those changes were implemented. In exportFile, annotations and graphs follow two different if blocks to save. The annotations if block calls a function 'collectSyms' which then adds the syms and their children, in order, to a rootsym which is a List. The size of this list is in the 60s when working with Arabidopsis Thaliana. The if block for graphs simply adds all of the syms into an array list with an .addAll method. These syms do not have children. Attached is a .py to help determine if a file contains duplicate rows.
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Attachment find_duplicates.py [ 14033 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            We will address this for release 9.0.2. Moving to backlog.

            Show
            ann.loraine Ann Loraine added a comment - We will address this for release 9.0.2. Moving to backlog.
            ann.loraine Ann Loraine made changes -
            Assignee Jennifer Daly [ jdaly ]
            ann.loraine Ann Loraine made changes -
            Sprint Early Fall 2017 [ 47 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 17788 ] Fall 2019 Workflow Update [ 19065 ]
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 19065 ] Revised Fall 2019 Workflow Update [ 21183 ]
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-2197 [ IGBF-2197 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                ieclabau Ivory Blakley (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: