Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1090

Implement saving of all chromosomes when exporting as bedgraph

    Details

    • Story Points:
      3
    • Sprint:
      Fall 2017

      Description

      A user requested that we add the ability to export data of all chromosomes in IGB when exporting a graph track (as a bedgraph file).

      We currently export data of all loaded chromosomes for annotations, but not graphs.

        Attachments

          Issue Links

            Activity

            mason Mason Meyer (Inactive) created issue -
            mason Mason Meyer (Inactive) made changes -
            Field Original Value New Value
            Summary Implementing saving of all chromosomes when exporting as bedgraph Implement saving of all chromosomes when exporting as bedgraph
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked higher
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked higher
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked lower
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Story Points 2
            ann.loraine Ann Loraine made changes -
            Story Points 2 3
            Sprint Sprint 39 [ 47 ]
            Assignee Jennifer Daly [ jdaly ]
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            sneha Sneha Ramesh Watharkar (Inactive) added a comment -

            Jennifer - finished creating header for .bedgraph files. Moving to 'needs testing'.

            Show
            sneha Sneha Ramesh Watharkar (Inactive) added a comment - Jennifer - finished creating header for .bedgraph files. Moving to 'needs testing'.
            sneha Sneha Ramesh Watharkar (Inactive) made changes -
            Status In Progress [ 3 ] Needs Testing [ 10002 ]
            mason Mason Meyer (Inactive) made changes -
            Flagged Impediment [ 10000 ]
            mason Mason Meyer (Inactive) made changes -
            Flagged Impediment [ 10000 ]
            mason Mason Meyer (Inactive) made changes -
            Flagged Impediment [ 10000 ]
            mason Mason Meyer (Inactive) made changes -
            Flagged Impediment [ 10000 ]
            mason Mason Meyer (Inactive) made changes -
            Status Needs Testing [ 10002 ] Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Status Testing In Progress [ 10003 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Jennifer Daly [ jdaly ] Mason Meyer [ mason ]
            mason Mason Meyer (Inactive) made changes -
            Status Needs Testing [ 10002 ] Testing In Progress [ 10003 ]
            mason Mason Meyer (Inactive) made changes -
            Status Testing In Progress [ 10003 ] Needs Testing [ 10002 ]
            mason Mason Meyer (Inactive) made changes -
            Status Needs Testing [ 10002 ] Testing In Progress [ 10003 ]
            Hide
            mason Mason Meyer (Inactive) added a comment - - edited

            I do see the header when exporting a bedgraph file, but I am not seeing all chromosomes saved when exporting a bedgraph. It seems that the bedgraph file that is exported only contains data for the chromosome the user is on when they export the bedgraph. So for example, if a user is on Chromosome 2, the exported bedgraph will only contain data for chromosome 2 but it should contain data for all chromosomes. If I am misunderstanding this story in some way, please let me know. For now, I am re-assigning the story to Jennifer and moving it to the To-Do column.

            Note: This had not been merged into main repo master - it was moved to Needs Testing incorrectly. However, Ivory (see below) has noted some problems. She is now taking over this issue.

            Show
            mason Mason Meyer (Inactive) added a comment - - edited I do see the header when exporting a bedgraph file, but I am not seeing all chromosomes saved when exporting a bedgraph. It seems that the bedgraph file that is exported only contains data for the chromosome the user is on when they export the bedgraph. So for example, if a user is on Chromosome 2, the exported bedgraph will only contain data for chromosome 2 but it should contain data for all chromosomes. If I am misunderstanding this story in some way, please let me know. For now, I am re-assigning the story to Jennifer and moving it to the To-Do column. Note: This had not been merged into main repo master - it was moved to Needs Testing incorrectly. However, Ivory (see below) has noted some problems. She is now taking over this issue.
            mason Mason Meyer (Inactive) made changes -
            Assignee Mason Meyer [ mason ] Jennifer Daly [ jdaly ]
            mason Mason Meyer (Inactive) made changes -
            Status Testing In Progress [ 10003 ] Open [ 1 ]
            mason Mason Meyer (Inactive) made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment -

            Assigning to Ivory as Jenny won't be able to get back to this before Friday.

            Show
            ann.loraine Ann Loraine added a comment - Assigning to Ivory as Jenny won't be able to get back to this before Friday.
            ann.loraine Ann Loraine made changes -
            Assignee Jennifer Daly [ jdaly ] Ivory Clabaugh [ ieclabau ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            I pulled branch IGBF-1090. I'm assessing the current state of the functionality.

            In my first effort to test the current state, IGB froze when I tried to save a graph track.
            The second time, it saved a track (named Chr1PartiallyLoaded.bedgraph.*.bedgraph) but it only saved a portion of the data that I had loaded and in view, not sure what determined the stop point.
            I also loaded data on Chr2, and saved the track. Data is saved from BOTH Chr1 and Chr2. All loaded data from Chr2 was saved, but the Chr1 data still stops at 8,227,586, even thought there is data loaded all the way to 8,251,939.
            Loaded nearly all of Chr1, saved. File has Chr1 data up to 8227586, and has Chr2 data.

            Sometimes (not always) the default file time in the save track dialogue is .gr instead of .bedgraph.

            Loaded whole genome, saved. Chr1 data still stops at 8227586. Chr2 data stops at 8233721 (the same place that the data was loaded to when I saved Chr2 for the first time).

            Restarted IGB.
            Loaded data here: Chr4:2,122,072-2,127,580. Zoomed out to here, loaded data: Chr4:2,110,482-2,139,204. Zoomed out to here (no load): Chr4:1,984,514-2,265,538. Saved. Saved data includes: Chr4:2,077,256-2,128,082. Starts just ahead of the left-est loaded data (good) but stops about 500 bp to the right of the first loaded right edge. That location would have been appropriate based on the first region loaded. But it seems the right boundary was not updated when more data was loaded.
            Loaded: Chr4:1,677,694-2,863,442. Loaded Chr5:22,391,063-22,401,357. Saved.
            Saved regions include: Chr4:1,672,970-2,128,082 and Chr5:22,390,711-22,401,935.
            Loaded: Chr5:22,338,488-22,453,028,
            Loaded: (center most) Chr2:14,515,589-14,522,725; Chr2:14,493,243-14,538,794; (outermost) Chr2:14,472,606-14,559,482
            Loaded: (center) Chr1:19,251,864-19,268,401; (left) Chr1:17,845,180-17,881,741; (right) Chr1:25,918,164-26,094,571
            Loaded: (all) ChrM:0-366,924
            Saved region includes:
            Chr1: Has all data from all three loaded regions.
            Chr2: Has all loaded data.
            Chr4: Has all loaded data.
            Chr5: Has all loaded data.
            ChrM: Has all loaded data.
            So... it saved perfectly that time. Curious.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited I pulled branch IGBF-1090 . I'm assessing the current state of the functionality. In my first effort to test the current state, IGB froze when I tried to save a graph track. The second time, it saved a track (named Chr1PartiallyLoaded.bedgraph.*.bedgraph) but it only saved a portion of the data that I had loaded and in view, not sure what determined the stop point. I also loaded data on Chr2, and saved the track. Data is saved from BOTH Chr1 and Chr2. All loaded data from Chr2 was saved, but the Chr1 data still stops at 8,227,586, even thought there is data loaded all the way to 8,251,939. Loaded nearly all of Chr1, saved. File has Chr1 data up to 8227586, and has Chr2 data. Sometimes (not always) the default file time in the save track dialogue is .gr instead of .bedgraph. Loaded whole genome, saved. Chr1 data still stops at 8227586. Chr2 data stops at 8233721 (the same place that the data was loaded to when I saved Chr2 for the first time). Restarted IGB. Loaded data here: Chr4:2,122,072-2,127,580. Zoomed out to here, loaded data: Chr4:2,110,482-2,139,204. Zoomed out to here (no load): Chr4:1,984,514-2,265,538. Saved. Saved data includes: Chr4:2,077,256-2,128,082. Starts just ahead of the left-est loaded data (good) but stops about 500 bp to the right of the first loaded right edge. That location would have been appropriate based on the first region loaded. But it seems the right boundary was not updated when more data was loaded. Loaded: Chr4:1,677,694-2,863,442. Loaded Chr5:22,391,063-22,401,357. Saved. Saved regions include: Chr4:1,672,970-2,128,082 and Chr5:22,390,711-22,401,935. Loaded: Chr5:22,338,488-22,453,028, Loaded: (center most) Chr2:14,515,589-14,522,725; Chr2:14,493,243-14,538,794; (outermost) Chr2:14,472,606-14,559,482 Loaded: (center) Chr1:19,251,864-19,268,401; (left) Chr1:17,845,180-17,881,741; (right) Chr1:25,918,164-26,094,571 Loaded: (all) ChrM:0-366,924 Saved region includes: Chr1: Has all data from all three loaded regions. Chr2: Has all loaded data. Chr4: Has all loaded data. Chr5: Has all loaded data. ChrM: Has all loaded data. So... it saved perfectly that time. Curious.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            New test. I removed the graph file and started again (same file: BA1_2.sm.bedgraph.gz).

            Loaded Chr1:24,484,001-24,488,309. saved. Whole region saved.
            Loaded Chr1:24,475,962-24,496,348. saved. Whole region saved.
            Loaded Chr2:12,528,756-12,542,799, then Chr2:12,410,980-12,638,925. saved. Whole region on both chromosome saved.

            New test (new file: BA2.sm.bedgraph.gz)

            Loaded Chr1:24,434,194-24,553,049. saved. Saved it all.
            Loaded Chr1:24,039,165-24,878,915. saved. saved it all.
            Loaded Chr1:24,039,165-24,878,915; Chr2:8,097,811-8,340,320. saved. saved it all (both chromosomes).

            New Test (restarted IGB, now using BA3.sm.bedgraph.gz)

            Loaded Chr1:7,650,931-7,659,240; Chr1:7,648,212-7,667,333. saved. loaded file.
            IGB is in a weird state where it will not respond to mouse clicks. (you get the keys and the text box)
            Restart IGB. looks like it didn't actually saved that file.
            Saving a new file.... spinning pinwheel of death.

            New Test (restarted IGB, now using C1.sm.bedgraph.gz)

            Loaded Chr1:4,664,013-4,683,134. loaded Chr1:4,594,501-4,763,591. saved.
            Loaded some on chr3, then loaded more, saved. both chromosomes saved. All is well.

            New Test. Reset preferences.
            loaded part of chr1, then more, saved, all saved. loaded more, saved, all saved.
            Changed the load mode to Genome (did not change view). Saved.
            Chr1 is saved up to the right boundry of the last save.
            Chr2 is all saved. Chr3, nothing saved. Chr4 all saved. Chr5 nothing saved. ChrC and ChrM, all saved.
            Saved again. all data from every chr is saved.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - New test. I removed the graph file and started again (same file: BA1_2.sm.bedgraph.gz). Loaded Chr1:24,484,001-24,488,309. saved. Whole region saved. Loaded Chr1:24,475,962-24,496,348. saved. Whole region saved. Loaded Chr2:12,528,756-12,542,799, then Chr2:12,410,980-12,638,925. saved. Whole region on both chromosome saved. New test (new file: BA2.sm.bedgraph.gz) Loaded Chr1:24,434,194-24,553,049. saved. Saved it all. Loaded Chr1:24,039,165-24,878,915. saved. saved it all. Loaded Chr1:24,039,165-24,878,915; Chr2:8,097,811-8,340,320. saved. saved it all (both chromosomes). New Test (restarted IGB, now using BA3.sm.bedgraph.gz) Loaded Chr1:7,650,931-7,659,240; Chr1:7,648,212-7,667,333. saved. loaded file. IGB is in a weird state where it will not respond to mouse clicks. (you get the keys and the text box) Restart IGB. looks like it didn't actually saved that file. Saving a new file.... spinning pinwheel of death. New Test (restarted IGB, now using C1.sm.bedgraph.gz) Loaded Chr1:4,664,013-4,683,134. loaded Chr1:4,594,501-4,763,591. saved. Loaded some on chr3, then loaded more, saved. both chromosomes saved. All is well. New Test. Reset preferences. loaded part of chr1, then more, saved, all saved. loaded more, saved, all saved. Changed the load mode to Genome (did not change view). Saved. Chr1 is saved up to the right boundry of the last save. Chr2 is all saved. Chr3, nothing saved. Chr4 all saved. Chr5 nothing saved. ChrC and ChrM, all saved. Saved again. all data from every chr is saved.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            Ok, something somewhere doesn't work BUT it only doesn't work some of the time.
            It may have to do with refreshing IGB's notion of what is currently loaded. (may or may not be specific to the right-end boundary.)

            good: Multiple chromosomes of data are saved in a single file (which is the main point of the issue!).
            good: The header correctly specifies the genome, not just the sequence (chromosome) name.

            bad: Sometimes, an area of loaded data (a whole chromosome, the right side of a chromosome) is just not saved. ----> ACTUALLY... I think it is. Initially, I was testing this by looking at the file itself, and I assumed the last line for each chromosome represented the right-most saved region. But after I started dropping files in to IGB to see what they had, I never found incomplete files. I think I must have another track that was only loaded up to Chr1 8,277,465 and it was the last one written to the file. So there isn't any loaded data that is getting left out, its just that there is a lot of extra data included. ---> this is just as bad, but it is a different problem.

            bad: IGB froze on me a couple times when I hit Save Track As .... that may be part of this issue, or it may be its own issue. --> this is difficult to reproduce.
            bad: The default type is inconsistent (sometimes .bedgraph, sometimes .gr) -> that's another issue.
            bad: The file extension has an extra ".*.bedgraph" on the end. -> that's another issue.

            poor: the header has extra spaces (sometimes there is a " " after the "=" and sometimes not.) THIS part of the header could be separated into its own issue.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited Ok, something somewhere doesn't work BUT it only doesn't work some of the time. It may have to do with refreshing IGB's notion of what is currently loaded. (may or may not be specific to the right-end boundary.) good: Multiple chromosomes of data are saved in a single file (which is the main point of the issue!). good: The header correctly specifies the genome, not just the sequence (chromosome) name. bad: Sometimes, an area of loaded data (a whole chromosome, the right side of a chromosome) is just not saved. ----> ACTUALLY... I think it is. Initially, I was testing this by looking at the file itself, and I assumed the last line for each chromosome represented the right-most saved region. But after I started dropping files in to IGB to see what they had, I never found incomplete files. I think I must have another track that was only loaded up to Chr1 8,277,465 and it was the last one written to the file. So there isn't any loaded data that is getting left out, its just that there is a lot of extra data included. ---> this is just as bad, but it is a different problem. bad: IGB froze on me a couple times when I hit Save Track As .... that may be part of this issue, or it may be its own issue. --> this is difficult to reproduce. bad: The default type is inconsistent (sometimes .bedgraph, sometimes .gr) -> that's another issue. bad: The file extension has an extra ".*.bedgraph" on the end. -> that's another issue. poor: the header has extra spaces (sometimes there is a " " after the "=" and sometimes not.) THIS part of the header could be separated into its own issue.
            Hide
            ann.loraine Ann Loraine added a comment -

            This sounds like a memory usage issue.
            Check memory upon fail to write and freeze-up.
            Also, use debugger to track what happens upon save.

            Show
            ann.loraine Ann Loraine added a comment - This sounds like a memory usage issue. Check memory upon fail to write and freeze-up. Also, use debugger to track what happens upon save.
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] Reviewing [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment -
            Show
            ann.loraine Ann Loraine added a comment - Jenny issue a pull request and I made comments on it - see https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/533/igbf-1090/diff
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            The save track feature is taking the header information from the wrong track.
            In my last test it used the BA1 graph track settings to make the header, even when I had selected a different track and it wrote out the data based on the track I selected. When I deleted the BA1 track, the program froze.

            I repeated this. BA1 was the first track loaded, and that was always the track used to make the header, regardless of which graph I selected to save the file.
            When I deleted BA1, the header was based on BA2 (which was the second file loaded), when C1 was the track selected.
            I hid the BA2 track, and tried to save a track from a the C1 graph. IGB froze.

            In one test, IGB never froze. It consistently made the header for the bedgraph file from the first graph track in "the list", where "the list" is some structure that tracks are added to when data is first loaded (not when the track is added).
            I added 6 graphs. I loaded data for each one individual (in a different order). I saved a series of bedraphs, and each time I deleted the track the header was made from. All tracks were removed in the order in which data was loaded.

            Obviously, IGB is SUPPOSED to be making the header from the track that I selected.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited The save track feature is taking the header information from the wrong track. In my last test it used the BA1 graph track settings to make the header, even when I had selected a different track and it wrote out the data based on the track I selected. When I deleted the BA1 track, the program froze. I repeated this. BA1 was the first track loaded, and that was always the track used to make the header, regardless of which graph I selected to save the file. When I deleted BA1, the header was based on BA2 (which was the second file loaded), when C1 was the track selected. I hid the BA2 track, and tried to save a track from a the C1 graph. IGB froze. In one test, IGB never froze. It consistently made the header for the bedgraph file from the first graph track in "the list", where "the list" is some structure that tracks are added to when data is first loaded (not when the track is added). I added 6 graphs. I loaded data for each one individual (in a different order). I saved a series of bedraphs, and each time I deleted the track the header was made from. All tracks were removed in the order in which data was loaded. Obviously, IGB is SUPPOSED to be making the header from the track that I selected.
            ieclabau Ivory Blakley (Inactive) made changes -
            Attachment SavedDataFromTwoSeparateTracks.png [ 13974 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            It looks like IGB is actually saving ALL data from ALL graph tracks.

            I added track BA1, loaded a section of data.
            I added track BA2, loaded a different section of data for only that track.
            I selected BA2 and saved. The saved file has data for both regions!!!! (the region that was only loaded for BA2 AND the region that was only loaded for BA1). The header is based on BA1.
            See image SavedDataFromTwoSeparateTracks.png.

            I was even able to save a single track with data from three separate tracks.
            Further,
            I found that where multiple tracks had data loaded, IGB did save the information from ALL of them to the output file.
            So the bedgraph file ("T13.3") had a bunch of lines representing BA1, then a bunch of lines representing the SAME REGION with data from BA2.
            I copied each of these into separate files and loaded them to confirm this.

            In fact, when the graphs are in "line" mode, the track for file T13.3 clearly has overlapping data.
            See image WhatsReallySaved.png

            This represents a fundamental flaw in this feature.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited It looks like IGB is actually saving ALL data from ALL graph tracks. I added track BA1, loaded a section of data. I added track BA2, loaded a different section of data for only that track. I selected BA2 and saved. The saved file has data for both regions!!!! (the region that was only loaded for BA2 AND the region that was only loaded for BA1). The header is based on BA1. See image SavedDataFromTwoSeparateTracks.png. I was even able to save a single track with data from three separate tracks. Further, I found that where multiple tracks had data loaded, IGB did save the information from ALL of them to the output file. So the bedgraph file ("T13.3") had a bunch of lines representing BA1, then a bunch of lines representing the SAME REGION with data from BA2. I copied each of these into separate files and loaded them to confirm this. In fact, when the graphs are in "line" mode, the track for file T13.3 clearly has overlapping data. See image WhatsReallySaved.png This represents a fundamental flaw in this feature.
            ieclabau Ivory Blakley (Inactive) made changes -
            Attachment WhatsReallySaved.png [ 13975 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            Regardless of which track is selected, the output is identical.
            The header is based on the first track that was loaded.
            The data saved includes all data loaded from all tracks.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - Regardless of which track is selected, the output is identical. The header is based on the first track that was loaded. The data saved includes all data loaded from all tracks.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            Example output file:

            1. genome_version = A_thaliana_Jun_2009
              track type=bedgraph name= " BA1_2 5uM 10 day old seedlings Rep1, coverage" description= " BA1_2 5uM 10 day old seedlings Rep1, coverage" visibility=full color=204,0,204 viewLimits=0.0:1.0
              Chr1 8227365 8227465 1.0
              Chr1 8227365 8227465 1.0
              Chr1 8227365 8227465 1.0
              Chr1 8227465 8227503 0.0
              Chr1 8227503 8227509 1.0
              Chr1 8227509 8227553 2.0
              Chr1 8227509 8227553 2.0
              Chr1 8227503 8227509 1.0

            Notice that "Chr1 8227509 8227553 2.0" (near the bottom) appears twice. The first time I loaded data, the edge fell in this range. After I zoomed out, the edge of the "new" range to download also fell in this range (this will be the case EVERY time that you load, zoom out, load more).

            Notice that "8227365 8227465" is in there (at the top) three times. This range at the edge of the view when I loaded data, then I zoomed out just a bit, so the edge was still in this range. Everytime there is new range (any new range on the coordinate axis) data is loaded for that range. So the Same line of data from the graph file is added for every time that it is the edge of the new range.
            bedgraph files should not have this duplication.

            I need to check bedgraph criteria to see if the format requires non-overlap, but I'm pretty sure programs that use it will either mess up or spit out a file that has duplication like this.

            Whatever function handles writing this file should make sure that there are no duplicated lines.

            In the outputs that contain data from multiple files, it is clear that the output is not sorted.
            Being sorted may not be a requirement of the file format, but it is a requirement for several programs that take in a bedgraph, and it makes life a LOT easier on anyone who has to look at it manually. This is also something that should be handled by whatever method writes writes this file type.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - Example output file: genome_version = A_thaliana_Jun_2009 track type=bedgraph name= " BA1_2 5uM 10 day old seedlings Rep1, coverage" description= " BA1_2 5uM 10 day old seedlings Rep1, coverage" visibility=full color=204,0,204 viewLimits=0.0:1.0 Chr1 8227365 8227465 1.0 Chr1 8227365 8227465 1.0 Chr1 8227365 8227465 1.0 Chr1 8227465 8227503 0.0 Chr1 8227503 8227509 1.0 Chr1 8227509 8227553 2.0 Chr1 8227509 8227553 2.0 Chr1 8227503 8227509 1.0 Notice that "Chr1 8227509 8227553 2.0" (near the bottom) appears twice. The first time I loaded data, the edge fell in this range. After I zoomed out, the edge of the "new" range to download also fell in this range (this will be the case EVERY time that you load, zoom out, load more). Notice that "8227365 8227465" is in there (at the top) three times. This range at the edge of the view when I loaded data, then I zoomed out just a bit, so the edge was still in this range. Everytime there is new range (any new range on the coordinate axis) data is loaded for that range. So the Same line of data from the graph file is added for every time that it is the edge of the new range. bedgraph files should not have this duplication. I need to check bedgraph criteria to see if the format requires non-overlap, but I'm pretty sure programs that use it will either mess up or spit out a file that has duplication like this. Whatever function handles writing this file should make sure that there are no duplicated lines. In the outputs that contain data from multiple files, it is clear that the output is not sorted. Being sorted may not be a requirement of the file format, but it is a requirement for several programs that take in a bedgraph, and it makes life a LOT easier on anyone who has to look at it manually. This is also something that should be handled by whatever method writes writes this file type.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            Testing IGB live:
            v. 9.0.0

            I set up to reproduce the line duplication. The first time I clicked Save Track As... IGB froze.
            The second time it was fine.
            (froze once more another time in this testing, when I clicked Save Track As... but I still don't know what causes it)

            -IGB live does produce the duplicated lines. (see * below) --> IGBF-1147 better bedgraph formatting
            -The lines are not (strictly) ordered. (See **) --> IGBF-1147 better bedgraph formatting
            -The header does not have extra spaces after '='.
            -The header is respected: when the saved track is loaded it shows the name from the header, not the file name. IGB also displays the file with -the color and y-axis range of the original track. In testing the branch, only the y-axis range was respected from the header. I think this is because of the header formatting because when I dropped a file made by IGB live into the window for IGB 1090, the header was fully respected.
            -The header is made from the correct tier.
            -The default file type (.gr or .bedgraph) is inconsistent.
            -Only data from the selected track is saved.

            IGB 1090 bedgraph header:

            1. genome_version = A_thaliana_Jun_2009
              track type=bedgraph name= " BA1_2 5uM 10 day old seedlings Rep1, coverage" description= " BA1_2 5uM 10 day old seedlings Rep1, coverage" visibility=full color=204,0,204 viewLimits=0.0:1.0

            IGB live bedgraph header:

            1. genome_version = Chr1
              track type=bedgraph name=" C2 DMSO only 10 day old seedling Rep2, coverage" description=" C2 DMSO only 10 day old seedling Rep2, coverage" visibility=full color=0,128,0 viewLimits=0.0:3.0

            Chr1 8227506 8227522 2.0
            Chr1 8227506 8227522 2.0 *
            Chr1 8227534 8227542 5.0
            Chr1 8227534 8227542 5.0
            Chr1 8227542 8227553 6.0
            Chr1 8227534 8227542 5.0
            Chr1 20998229 20998235 5.0
            Chr1 20998235 20998333 0.0
            Chr1 8227506 8227522 2.0 **

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited Testing IGB live: v. 9.0.0 I set up to reproduce the line duplication. The first time I clicked Save Track As... IGB froze. The second time it was fine. (froze once more another time in this testing, when I clicked Save Track As... but I still don't know what causes it) -IGB live does produce the duplicated lines. (see * below) --> IGBF-1147 better bedgraph formatting -The lines are not (strictly) ordered. (See **) --> IGBF-1147 better bedgraph formatting -The header does not have extra spaces after '='. -The header is respected: when the saved track is loaded it shows the name from the header, not the file name. IGB also displays the file with -the color and y-axis range of the original track. In testing the branch, only the y-axis range was respected from the header. I think this is because of the header formatting because when I dropped a file made by IGB live into the window for IGB 1090, the header was fully respected. -The header is made from the correct tier. -The default file type (.gr or .bedgraph) is inconsistent. -Only data from the selected track is saved. IGB 1090 bedgraph header: genome_version = A_thaliana_Jun_2009 track type=bedgraph name= " BA1_2 5uM 10 day old seedlings Rep1, coverage" description= " BA1_2 5uM 10 day old seedlings Rep1, coverage" visibility=full color=204,0,204 viewLimits=0.0:1.0 IGB live bedgraph header: genome_version = Chr1 track type=bedgraph name=" C2 DMSO only 10 day old seedling Rep2, coverage" description=" C2 DMSO only 10 day old seedling Rep2, coverage" visibility=full color=0,128,0 viewLimits=0.0:3.0 Chr1 8227506 8227522 2.0 Chr1 8227506 8227522 2.0 * Chr1 8227534 8227542 5.0 Chr1 8227534 8227542 5.0 Chr1 8227542 8227553 6.0 Chr1 8227534 8227542 5.0 Chr1 20998229 20998235 5.0 Chr1 20998235 20998333 0.0 Chr1 8227506 8227522 2.0 **
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            Quick check on the annotation saving in IGB live:

            bed files are saved as name.bed.*.bed. ---so that is shared
            Multiple chromosomes of data are saved, and only data from the selected track is saved. --so the annotation saving stuff is probably a good thing to copy-cat from.
            In bed detail, each line ends with, N/A N/A

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - Quick check on the annotation saving in IGB live: bed files are saved as name.bed.*.bed. ---so that is shared Multiple chromosomes of data are saved, and only data from the selected track is saved. --so the annotation saving stuff is probably a good thing to copy-cat from. In bed detail, each line ends with, N/A N/A
            ieclabau Ivory Blakley (Inactive) made changes -
            Link This issue relates to IGBF-1147 [ IGBF-1147 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Link This issue relates to IGBF-1128 [ IGBF-1128 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            The bit about the header not being respected is due to the spaces.
            See Wiggle.java

            513 bw.write("track type=" + getTrackType() + " name= \"" + gname + "\"");
            514 bw.write(" description= \"" + human_name + "\"");

            They were added in this branch and they need to be taken back out.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - The bit about the header not being respected is due to the spaces. See Wiggle.java 513 bw.write("track type=" + getTrackType() + " name= \"" + gname + "\""); 514 bw.write(" description= \"" + human_name + "\""); They were added in this branch and they need to be taken back out.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            good: Multiple chromosomes of data are saved in a single file (which is the main point of the issue!).
            good: The header correctly specifies the genome, not just the sequence (chromosome) name.

            bad: The header has extra spaces (sometimes there is a " " after the "=" ). This was introduced on this branch. (See last comment)
            bad: All loaded data (from all graph tracks) is being exported.
            bad: The information for the header is taken from the wrong tier.

            The above problems are all introduced in this branch.
            We need to really re-think how we are getting it to save multiple chromosomes of data.
            Somehow annotation data is saved from all chromosomes while still only taking data from the desired track.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - good: Multiple chromosomes of data are saved in a single file (which is the main point of the issue!). good: The header correctly specifies the genome, not just the sequence (chromosome) name. bad: The header has extra spaces (sometimes there is a " " after the "=" ). This was introduced on this branch. (See last comment) bad: All loaded data (from all graph tracks) is being exported. bad: The information for the header is taken from the wrong tier. The above problems are all introduced in this branch. We need to really re-think how we are getting it to save multiple chromosomes of data. Somehow annotation data is saved from all chromosomes while still only taking data from the desired track.
            ieclabau Ivory Blakley (Inactive) made changes -
            Link This issue relates to IGBF-1148 [ IGBF-1148 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Link This issue relates to IGBF-1149 [ IGBF-1149 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Status Reviewing [ 10301 ] Open [ 1 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Assignee Ivory Clabaugh [ ieclabau ] Jennifer Daly [ jdaly ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            sneha Sneha Ramesh Watharkar (Inactive) added a comment -

            Started a new branch and copied the changes from the original onto this new branch in one, clean commit. Link to new branch:

            https://bitbucket.org/jdaly8/integrated-genome-browser/branch/IGBF-1090-b

            Show
            sneha Sneha Ramesh Watharkar (Inactive) added a comment - Started a new branch and copied the changes from the original onto this new branch in one, clean commit. Link to new branch: https://bitbucket.org/jdaly8/integrated-genome-browser/branch/IGBF-1090-b
            ieclabau Ivory Blakley (Inactive) made changes -
            Comment [ change we were working on ...
                            aseq.getGenomeVersion().getSeqList().forEach(seq -> {
                                //myList.get(0).addAll(seq.getAnnotations(Pattern.compile(".*")).stream().filter(s -> s instanceof GraphSym).collect(Collectors.toList()));
                                myList.get(0).addAll(seq.getAnnotations(Pattern.compile(".*")).stream().filter(s -> s == atier.getInfo()).collect(Collectors.toList()));
                            }); ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Assignee Jennifer Daly [ jdaly ] Ivory Clabaugh [ ieclabau ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment - - edited

            The latest iteration of this branch is on Ivory's remote branch:

            https://bitbucket.org/IvoryBlak/integrated-genome-browser/branch/IGBF-1090-b

            Code review tip:
            --the code was simplified and commented in the most recent commit.

            Functionality review tips:
            --Load data from multiple data sets, and on multiple chromosomes. Make sure the saved file reflects ONLY the selected track and has data for ALL chromosomes where data was loaded.
            --When you save a file, you can drag/drop the new file onto IGB to quickly see which regions were saved. Files saved in by IGB should load correctly in IGB.
            --Test the above for both annotation tracks and graphs tracks. (for everything else, you can just check graph tracks)
            --Graph tracks saved in IGB should have the same name and color scheme as the original file (only after loading data from the saved file).
            --Look at the saved graph file in a text editor. The genome in the header should match the GENOME not the chromosome.

            This issue is ready for review.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - - edited The latest iteration of this branch is on Ivory's remote branch: https://bitbucket.org/IvoryBlak/integrated-genome-browser/branch/IGBF-1090-b Code review tip: --the code was simplified and commented in the most recent commit. Functionality review tips: --Load data from multiple data sets, and on multiple chromosomes. Make sure the saved file reflects ONLY the selected track and has data for ALL chromosomes where data was loaded. --When you save a file, you can drag/drop the new file onto IGB to quickly see which regions were saved. Files saved in by IGB should load correctly in IGB. --Test the above for both annotation tracks and graphs tracks. (for everything else, you can just check graph tracks) --Graph tracks saved in IGB should have the same name and color scheme as the original file (only after loading data from the saved file). --Look at the saved graph file in a text editor. The genome in the header should match the GENOME not the chromosome. This issue is ready for review.
            ieclabau Ivory Blakley (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ieclabau Ivory Blakley (Inactive) made changes -
            Assignee Ivory Clabaugh [ ieclabau ]
            Hide
            sneha Sneha Ramesh Watharkar (Inactive) added a comment - - edited

            #1 1st level review

            • Color, name, description match selected track
            • Only selected track saves, for all loaded chromosomes
            • Annotations working same as before, doesn't have a header and still saves multiple chromosomes.
            Show
            sneha Sneha Ramesh Watharkar (Inactive) added a comment - - edited #1 1st level review Color, name, description match selected track Only selected track saves, for all loaded chromosomes Annotations working same as before, doesn't have a header and still saves multiple chromosomes.
            ieclabau Ivory Blakley (Inactive) made changes -
            Link This issue relates to IGBF-144 [ IGBF-144 ]
            Hide
            mason Mason Meyer (Inactive) added a comment - - edited

            #2 1st-Level Review:

            • Only the selected track saves for any chromosome that has been loaded.
            • Annotation tracks are being saved as expected with no header
            • Graph tracks have the same track name and color scheme as the original file

            Since this issue has been reviewed twice and passed the reviews, I am moving the issue to the Needs Pull Request Column and assigning the issue to Ivory.

            Show
            mason Mason Meyer (Inactive) added a comment - - edited #2 1st-Level Review: Only the selected track saves for any chromosome that has been loaded. Annotation tracks are being saved as expected with no header Graph tracks have the same track name and color scheme as the original file Since this issue has been reviewed twice and passed the reviews, I am moving the issue to the Needs Pull Request Column and assigning the issue to Ivory.
            mason Mason Meyer (Inactive) made changes -
            Status Needs 1st Level Review [ 10005 ] Ready for Pull Request [ 10304 ]
            mason Mason Meyer (Inactive) made changes -
            Assignee Ivory Clabaugh [ ieclabau ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ivory Clabaugh [ ieclabau ] Mason Meyer [ mason ]
            mason Mason Meyer (Inactive) made changes -
            Status Needs Testing [ 10002 ] Testing In Progress [ 10003 ]
            Hide
            mason Mason Meyer (Inactive) added a comment -

            While testing this story on the development branch I noticed that it was not functioning as expected because data was not being saved for ALL chromosomes in which data had been loaded. Is it possible that the code that made it to eos.transvar.org comes from Ivory's story IGBF-1090 instead of her working branch called IGBF-1090-b?

            To reproduce:

            1) Download and run IGB from http://eos.transvar.org/igb/
            2) Open A. thaliana genome and load some bedgraph data on Chr1 and Chr3
            3) Now Save the bedgraph track as a new file (do this from Chr3)
            4) Open th saved track in IGB and load data for Chr 1 and Chr 3.

            *Observe: There is no data on Chr 1 for the new track, but there should be.

            I am re-assinging this to Ivory and moving it back to the Reviewing Pull-Request column.

            Show
            mason Mason Meyer (Inactive) added a comment - While testing this story on the development branch I noticed that it was not functioning as expected because data was not being saved for ALL chromosomes in which data had been loaded. Is it possible that the code that made it to eos.transvar.org comes from Ivory's story IGBF-1090 instead of her working branch called IGBF-1090 -b? To reproduce: 1) Download and run IGB from http://eos.transvar.org/igb/ 2) Open A. thaliana genome and load some bedgraph data on Chr1 and Chr3 3) Now Save the bedgraph track as a new file (do this from Chr3) 4) Open th saved track in IGB and load data for Chr 1 and Chr 3. *Observe: There is no data on Chr 1 for the new track, but there should be. I am re-assinging this to Ivory and moving it back to the Reviewing Pull-Request column.
            mason Mason Meyer (Inactive) made changes -
            Assignee Mason Meyer [ mason ] Ivory Clabaugh [ ieclabau ]
            mason Mason Meyer (Inactive) made changes -
            Status Testing In Progress [ 10003 ] Reviewing Pull Request [ 10303 ]
            mason Mason Meyer (Inactive) made changes -
            Status Reviewing Pull Request [ 10303 ] Open [ 1 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            I just updated my master branch and tested. Like Mason said, data from a graph track is saved for ONLY the current chromosome, not for all loaded chromosomes.

            The most recent changes on branch IGBF-1090-b are not reflected in the current master branch.
            I searched in the commits for main master and I do not see any commit where branch IGBF-1090 or IGBF-1090-b was ever merged in.

            If you can give me the commit id or pull request number so I can see exactly what was merged in I can look into this further.
            My current best guess is that this branch never was merged into master.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - I just updated my master branch and tested. Like Mason said, data from a graph track is saved for ONLY the current chromosome, not for all loaded chromosomes. The most recent changes on branch IGBF-1090 -b are not reflected in the current master branch. I searched in the commits for main master and I do not see any commit where branch IGBF-1090 or IGBF-1090 -b was ever merged in. If you can give me the commit id or pull request number so I can see exactly what was merged in I can look into this further. My current best guess is that this branch never was merged into master.
            ieclabau Ivory Blakley (Inactive) made changes -
            Status Open [ 1 ] Pull Request Submitted [ 10101 ]
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            I rebased branch IGBF-1090-b from master, tested (still good) and submitted a pull request.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - I rebased branch IGBF-1090 -b from master, tested (still good) and submitted a pull request.
            ieclabau Ivory Blakley (Inactive) made changes -
            Assignee Ivory Clabaugh [ ieclabau ] Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Mason Meyer [ mason ]
            mason Mason Meyer (Inactive) made changes -
            Status Needs Testing [ 10002 ] Testing In Progress [ 10003 ]
            Hide
            mason Mason Meyer (Inactive) added a comment -

            After testing, it seems the issue is resolved.

            *Only the selected track saves for any chromosome that has been loaded.
            *Annotation tracks are being saved as expected with no header
            *Graph tracks have the same track name and color scheme as the original file
            *I could not find any other side effects or issues related to this change.

            Since this issue is resolved, it will now be closed.

            Show
            mason Mason Meyer (Inactive) added a comment - After testing, it seems the issue is resolved. *Only the selected track saves for any chromosome that has been loaded. *Annotation tracks are being saved as expected with no header *Graph tracks have the same track name and color scheme as the original file *I could not find any other side effects or issues related to this change. Since this issue is resolved, it will now be closed.
            mason Mason Meyer (Inactive) made changes -
            Resolution Done [ 10000 ]
            Status Testing In Progress [ 10003 ] Closed [ 6 ]
            mason Mason Meyer (Inactive) made changes -
            Fix Version/s 9.0.1 Minor Release [ 10500 ]
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 16971 ] Fall 2019 Workflow Update [ 19760 ]
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 19760 ] Revised Fall 2019 Workflow Update [ 21879 ]

              People

              • Assignee:
                mason Mason Meyer (Inactive)
                Reporter:
                mason Mason Meyer (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: