Details
-
Type:
Bug
-
Status: In Progress (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
Description
-IGB live does produce the duplicated lines. (see * below)
-The lines are not (strictly) ordered. (See **)
Chr1 8227506 8227522 2.0
Chr1 8227506 8227522 2.0 *
Chr1 8227534 8227542 5.0
Chr1 8227534 8227542 5.0
Chr1 8227542 8227553 6.0
Chr1 8227534 8227542 5.0
Chr1 20998229 20998235 5.0
Chr1 20998235 20998333 0.0
Chr1 8227506 8227522 2.0 **
For some downstream use cases, non-overlap and being sorted are required, or at least desired.
Attachments
Issue Links
Activity
Description |
-IGB live does produce the duplicated lines. (see * below) -The lines are not (strictly) ordered. (See **) Chr1 8227506 8227522 2.0 Chr1 8227506 8227522 2.0 * Chr1 8227534 8227542 5.0 Chr1 8227534 8227542 5.0 Chr1 8227542 8227553 6.0 Chr1 8227534 8227542 5.0 Chr1 20998229 20998235 5.0 Chr1 20998235 20998333 0.0 Chr1 8227506 8227522 2.0 ** For some downstream use cases, non-overlap and being sorted are required, or at least desired. |
Rank | Ranked higher |
Assignee | Ann Loraine [ aloraine ] | Jennifer Daly [ jdaly ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Sprint | Spring 2017 [ 47 ] |
Rank | Ranked lower |
Comment |
[ I made this script to find duplicates so I don't have to look through the file:
#! /usr/bin/env python from __future__ import print_function log = open("output.txt", "w") def find_duplicates(file): with open(file) as f: seen = set() for line in f: line_lower = line.lower() if line_lower in seen: print(line, file = log) else: seen.add(line_lower) find_duplicates("test1.bedgraph") I tested it by creating a duplicate on purpose, it works. I tested 4 files from master branch, and 4 files from IGB live, none of them produced duplicates. @Ivory, is this issue a result of IGB saving multiple tiers, the issue we just fixed by re-writing exportfileaction? I know you said you were able to reproduce it in IGB live, can you test it again? ] |
Attachment | find_duplicates.py [ 14033 ] |
Assignee | Jennifer Daly [ jdaly ] |
Sprint | Early Fall 2017 [ 47 ] |
Rank | Ranked higher |
Workflow | Loraine Lab Workflow [ 17788 ] | Fall 2019 Workflow Update [ 19065 ] |
Workflow | Fall 2019 Workflow Update [ 19065 ] | Revised Fall 2019 Workflow Update [ 21183 ] |