Details
-
Type:
Bug
-
Status: In Progress (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
Description
-IGB live does produce the duplicated lines. (see * below)
-The lines are not (strictly) ordered. (See **)
Chr1 8227506 8227522 2.0
Chr1 8227506 8227522 2.0 *
Chr1 8227534 8227542 5.0
Chr1 8227534 8227542 5.0
Chr1 8227542 8227553 6.0
Chr1 8227534 8227542 5.0
Chr1 20998229 20998235 5.0
Chr1 20998235 20998333 0.0
Chr1 8227506 8227522 2.0 **
For some downstream use cases, non-overlap and being sorted are required, or at least desired.
Attachments
Issue Links
Activity
Description |
-IGB live does produce the duplicated lines. (see * below) -The lines are not (strictly) ordered. (See **) Chr1 8227506 8227522 2.0 Chr1 8227506 8227522 2.0 * Chr1 8227534 8227542 5.0 Chr1 8227534 8227542 5.0 Chr1 8227542 8227553 6.0 Chr1 8227534 8227542 5.0 Chr1 20998229 20998235 5.0 Chr1 20998235 20998333 0.0 Chr1 8227506 8227522 2.0 ** For some downstream use cases, non-overlap and being sorted are required, or at least desired. |
Rank | Ranked higher |
Assignee | Ann Loraine [ aloraine ] | Jennifer Daly [ jdaly ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Sprint | Spring 2017 [ 47 ] |
Rank | Ranked lower |
Comment |
[ I made this script to find duplicates so I don't have to look through the file:
#! /usr/bin/env python from __future__ import print_function log = open("output.txt", "w") def find_duplicates(file): with open(file) as f: seen = set() for line in f: line_lower = line.lower() if line_lower in seen: print(line, file = log) else: seen.add(line_lower) find_duplicates("test1.bedgraph") I tested it by creating a duplicate on purpose, it works. I tested 4 files from master branch, and 4 files from IGB live, none of them produced duplicates. @Ivory, is this issue a result of IGB saving multiple tiers, the issue we just fixed by re-writing exportfileaction? I know you said you were able to reproduce it in IGB live, can you test it again? ] |
Attachment | find_duplicates.py [ 14033 ] |
Assignee | Jennifer Daly [ jdaly ] |
Sprint | Early Fall 2017 [ 47 ] |
Rank | Ranked higher |
Workflow | Loraine Lab Workflow [ 17788 ] | Fall 2019 Workflow Update [ 19065 ] |
Workflow | Fall 2019 Workflow Update [ 19065 ] | Revised Fall 2019 Workflow Update [ 21183 ] |
This issue occurs on graphs when a region is loaded, and then another region that contains genes that overlap with the first region is loaded. The overlapping genes are loaded each time a new region containing them is loaded into view. Annotations, .bed files, are saved correctly when overlapping regions are selected. This issue only exists with graphs.
The first difference between the two methods of file saving exists in the exportFile method in the ExportFileAction class. This is the same class that was altered in issue 1090, but the issue existed before those changes were implemented.
In exportFile, annotations and graphs follow two different if blocks to save. The annotations if block calls a function 'collectSyms' which then adds the syms and their children, in order, to a rootsym which is a List. The size of this list is in the 60s when working with Arabidopsis Thaliana.
The if block for graphs simply adds all of the syms into an array list with an .addAll method. These syms do not have children.
Attached is a .py to help determine if a file contains duplicate rows.