Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1933

Investigate: Data data set name for bw (bigwig) files

    Details

    • Story Points:
      2
    • Sprint:
      Summer 2019 Sprint 12, Fall 2019 Sprint 1, Fall 2019 Sprint 2

      Description

      I am creating a Quickload site for a colleague in which I'm distributing some bigwig (.bw) files.

      I am able to open these files into IGB in the usual way. When I do that, the file appears in the Data Management table with the expected data set name. Everything works great!

      However, once I click "Load Data", the data set name changes to the name of the file, minus the ".bw" file extension.

      This is a very strange bug! Please investigate.

        Attachments

          Issue Links

            Activity

            Hide
            noor91zahara Noor Zahara (Inactive) added a comment - - edited

            public class BigWigSymLoader has a method named parse where we are setting gid (graph name).
            It is set as file name without the extension and hence the extension disappears.

            Note : Before the data is loaded the style is set with filename with extension and once data is loaded, it is overriden by filename without extension.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - - edited public class BigWigSymLoader has a method named parse where we are setting gid (graph name). It is set as file name without the extension and hence the extension disappears. Note : Before the data is loaded the style is set with filename with extension and once data is loaded, it is overriden by filename without extension.
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            Initially when the file is loaded, dataset name is set as track name.
            Once the genome data is loaded, graph id field present in GraphSym class is getting set as the last part of the name field present in annots.xml

            Example - Details in annots.xml
            name = "https://krizek-lab.s3.amazonaws.com/ANT-GR_rnaseq/ANT-GR-rnaseq/ANT-induced-2h.rnaseq.bw"
            title = "ANT-GR RNA-Seq/Graph - Scaled/ANT induced 2 hours scaled coverage"

            Initally track name = ANT induced 2 hours scaled coverage"
            Post the data is loaded = graph id gets set to ANT-induced-2h.rnaseq and track name gets overridden by graph id.

            Solution -

            When creating graph symloader object, set the graph id to title.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - Initially when the file is loaded, dataset name is set as track name. Once the genome data is loaded, graph id field present in GraphSym class is getting set as the last part of the name field present in annots.xml Example - Details in annots.xml name = "https://krizek-lab.s3.amazonaws.com/ANT-GR_rnaseq/ANT-GR-rnaseq/ANT-induced-2h.rnaseq.bw" title = "ANT-GR RNA-Seq/Graph - Scaled/ANT induced 2 hours scaled coverage" Initally track name = ANT induced 2 hours scaled coverage" Post the data is loaded = graph id gets set to ANT-induced-2h.rnaseq and track name gets overridden by graph id. Solution - When creating graph symloader object, set the graph id to title.
            Hide
            ann.loraine Ann Loraine added a comment -

            I'm concerned the proposed solution will break other parts of the code that expect the "id" field to be:

            • unique
            • equal to the data set's URL

            For the next steps:

            • Compare behavior for other file types to understand the proper behavior. For example, you can see what happens for bedgraph
            • After Loading Data, click a track label and then view Selection Info tab to inspect properties. What happens for different file formats?
            • Look at Data Management Table code to understand how it determines what to show.

            Also, recall that viewing a data file in IGB has two steps:

            • Opening the file - add a new track to IGB by selecting a data set in the Available Data Sets menu or selecting File > Open operations
            • Loading the data - clicking Load Data button, which adds new data to a track

            In your explanations, please use the most precise terms possible. You can also provide links to code. Google "linking to source code on bitbucket" to see how it's done.

            Show
            ann.loraine Ann Loraine added a comment - I'm concerned the proposed solution will break other parts of the code that expect the "id" field to be: unique equal to the data set's URL For the next steps: Compare behavior for other file types to understand the proper behavior. For example, you can see what happens for bedgraph After Loading Data, click a track label and then view Selection Info tab to inspect properties. What happens for different file formats? Look at Data Management Table code to understand how it determines what to show. Also, recall that viewing a data file in IGB has two steps: Opening the file - add a new track to IGB by selecting a data set in the Available Data Sets menu or selecting File > Open operations Loading the data - clicking Load Data button, which adds new data to a track In your explanations, please use the most precise terms possible. You can also provide links to code. Google "linking to source code on bitbucket" to see how it's done.
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment - - edited

            1. The id/ gid field is not getting set to data set's URL but instead it is getting set to the name (i.e ANT-induced-2h.rnaseq in case the URL is "https://krizek-lab.s3.amazonaws.com/ANT-GR_rnaseq/ANT-GR-rnaseq/ANT-induced-2h.rnaseq.bw")

            The function where this is happening is private List<? extends SeqSymmetry> parse(BioSeq seq, BigWigIterator wigIterator)

            { ... } of BigWigSymLoader class.
            The line where the id is getting set is GraphIntervalSym graphIntervalSym = new GraphIntervalSym(xList, wList, yList, featureName, seq); of BigWigSymLoader class.

            Solution : Change the featureName to uri.toString() and everything works as expected.

            Note : The above change is triggered after clicking on Load Data button.

            2. I checked the flow for bam files. While fetching the list of symmetries for the given chromosome range, the meth field in the below line of BAM class
            BAMSym bamSym = (BAMSym) convertSAMRecordToSymWithProps(sr, seq, uri.toString());
            is set as uri and not the dataset name. Hence, it is working fine.

            Note : The name of the function where the above line of code is present is public synchronized List<SeqSymmetry> parse(SeqSpan span) throws Exception { ... }

            3. In DataManagement Table code, the track name is getting set via style object if present else dataset name is being used.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - - edited 1. The id/ gid field is not getting set to data set's URL but instead it is getting set to the name (i.e ANT-induced-2h.rnaseq in case the URL is "https://krizek-lab.s3.amazonaws.com/ANT-GR_rnaseq/ANT-GR-rnaseq/ANT-induced-2h.rnaseq.bw") The function where this is happening is private List<? extends SeqSymmetry> parse(BioSeq seq, BigWigIterator wigIterator) { ... } of BigWigSymLoader class. The line where the id is getting set is GraphIntervalSym graphIntervalSym = new GraphIntervalSym(xList, wList, yList, featureName, seq); of BigWigSymLoader class. Solution : Change the featureName to uri.toString() and everything works as expected. Note : The above change is triggered after clicking on Load Data button. 2. I checked the flow for bam files. While fetching the list of symmetries for the given chromosome range, the meth field in the below line of BAM class BAMSym bamSym = (BAMSym) convertSAMRecordToSymWithProps(sr, seq, uri.toString()); is set as uri and not the dataset name. Hence, it is working fine. Note : The name of the function where the above line of code is present is public synchronized List<SeqSymmetry> parse(SeqSpan span) throws Exception { ... } 3. In DataManagement Table code, the track name is getting set via style object if present else dataset name is being used.
            Hide
            ann.loraine Ann Loraine added a comment -

            2 Followup questions:

            • Can you clarify #2 above? I'm not sure what "type set is uri" means.
            • Can you check the flow for ".wig" files? What is featureName getting set to there? (".wig" is a plain text version of bigwig)
            Show
            ann.loraine Ann Loraine added a comment - 2 Followup questions: Can you clarify #2 above? I'm not sure what "type set is uri" means. Can you check the flow for ".wig" files? What is featureName getting set to there? (".wig" is a plain text version of bigwig)
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            I analyzed the flow for wig files.

            1. If the dataset name is specified in the file then the track name is set to that name once load data is clicked else graph id is set to uri.toString().
            Below is the line of code present in Wiggle class where graph id is set.
            a. The name of the function - private static List<GraphSym> createGraphSyms(Map<String, String> track_hash, GenomeVersion seq_group,
            Map<String, WiggleData> current_datamap, String stream_name, String extension)

            {...}

            b. The line of code -
            String graph_id = track_hash.get(TrackLineParser.NAME);
            if (graph_id == null)

            { graph_id = stream_name; }

            Note : stream_name = uri.toString() as per the calling function.
            private List<GraphSym> parse(Iterator<String> it, BioSeq seq, int min, int max)

            {........ grafs.addAll(createGraphSyms(track_line_parser.getTrackLineContent(), genomeVersion, current_datamap, uri.toString(), extension)); ......}
            Show
            noor91zahara Noor Zahara (Inactive) added a comment - I analyzed the flow for wig files. 1. If the dataset name is specified in the file then the track name is set to that name once load data is clicked else graph id is set to uri.toString(). Below is the line of code present in Wiggle class where graph id is set. a. The name of the function - private static List<GraphSym> createGraphSyms(Map<String, String> track_hash, GenomeVersion seq_group, Map<String, WiggleData> current_datamap, String stream_name, String extension) {...} b. The line of code - String graph_id = track_hash.get(TrackLineParser.NAME); if (graph_id == null) { graph_id = stream_name; } Note : stream_name = uri.toString() as per the calling function. private List<GraphSym> parse(Iterator<String> it, BioSeq seq, int min, int max) {........ grafs.addAll(createGraphSyms(track_line_parser.getTrackLineContent(), genomeVersion, current_datamap, uri.toString(), extension)); ......}
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Next:

            • Now, please describe the flow when a user opens a bedgraph file from a Quickload site.

            1) First, describe how the "name" and "title" attributes specified in the file's "file" tag (annots.xml) get parsed and associated with that file prior to loading. Which instance variables or Property objects belonging to which objects get populated using these values?

            2) Second, describe how whether and how these instance variables or Property objects get modified or over-written when the user clicks "Load Data."

            Please use super-clear and super-precise terminology!

            For example, use the term "instance variable" to indicate a named variable accessible to all methods in an object. Use the term "Property" (upper-case because it's a class) to refer to a java Property object associated with another object.

            Tip: Refresh your knowledge of java Property objects by reading documentation, e.g.,

            https://www.geeksforgeeks.org/java-util-properties-class-java/

            Show
            ann.loraine Ann Loraine added a comment - - edited Next: Now, please describe the flow when a user opens a bedgraph file from a Quickload site. 1) First, describe how the "name" and "title" attributes specified in the file's "file" tag (annots.xml) get parsed and associated with that file prior to loading. Which instance variables or Property objects belonging to which objects get populated using these values? 2) Second, describe how whether and how these instance variables or Property objects get modified or over-written when the user clicks "Load Data." Please use super-clear and super-precise terminology! For example, use the term "instance variable" to indicate a named variable accessible to all methods in an object. Use the term "Property" (upper-case because it's a class) to refer to a java Property object associated with another object. Tip: Refresh your knowledge of java Property objects by reading documentation, e.g., https://www.geeksforgeeks.org/java-util-properties-class-java/
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment - - edited

            1. When the user selects a genome version by clicking an image on the IGB home screen:

            a. The initial step is loading all the annotations and initializing the data containers.
            Below is the line of code present in Quickload Utils class where the annots.xml file is parsed and the datasets are set.

            public static Optional<Set<QuickloadFile>> getGenomeVersionData(String quickloadUrl, String genomeVersionName, Map<String, Optional<String>> supportedGenomeVersionInfo, GenomeVersionSynonymLookup genomeVersionSynonymLookup)

            {... }

            Note : The 'properties' field of DataSet class contains details such as title, name etc and the 'name' field of DataSet class is set to the value of 'title' attribute.

            b. All the symmetry details are loaded.

            2. The flow for bedgraph files is same as that of wig files.
            The files are getting parsed using the same function as mentioned above that is present in Wiggle class.
            The local variable - graph id is set to the value of 'name' attribute if present in the bedgraph file's header else set to uri.toString().

            3. When the file is loaded the unique_name field of the below method of IGBStateProvider class is set to uri.toString()

            public ITrackStyleExtended getAnnotStyle(String unique_name, String track_name, String file_type, java.util.Map<String, String> props)

            {...}

            and post loading the data it gets over-written by graph_id.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - - edited 1. When the user selects a genome version by clicking an image on the IGB home screen: a. The initial step is loading all the annotations and initializing the data containers. Below is the line of code present in Quickload Utils class where the annots.xml file is parsed and the datasets are set. public static Optional<Set<QuickloadFile>> getGenomeVersionData(String quickloadUrl, String genomeVersionName, Map<String, Optional<String>> supportedGenomeVersionInfo, GenomeVersionSynonymLookup genomeVersionSynonymLookup) {... } Note : The 'properties' field of DataSet class contains details such as title, name etc and the 'name' field of DataSet class is set to the value of 'title' attribute. b. All the symmetry details are loaded. 2. The flow for bedgraph files is same as that of wig files. The files are getting parsed using the same function as mentioned above that is present in Wiggle class. The local variable - graph id is set to the value of 'name' attribute if present in the bedgraph file's header else set to uri.toString(). 3. When the file is loaded the unique_name field of the below method of IGBStateProvider class is set to uri.toString() public ITrackStyleExtended getAnnotStyle(String unique_name, String track_name, String file_type, java.util.Map<String, String> props) {...} and post loading the data it gets over-written by graph_id.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            A few more questions:

            • Can you change "When we click on any of the species" to use more precise language? I think it means: "when the user selects a genome version by clicking an image on the IGB start screen" (It's also correct to call the "start screen" a "home screen" if you think that's better.)
            • I'm kind of confused about step (b) above. Is this intending to say that (b) happens after (a) and is also triggered when the user selects a genome version (since it's under heading number 1)? The wording ("All of the symmetry details are loaded") suggests that maybe it's talking about what happens when the user clicks the "Load Data" button?
            • regarding 2: I'm kind of confused also about this sentence: "The files are getting parsed using the same function as mentioned above that is present in Wiggle class." Is the code actually using a Wiggle class function or is it identical to a function also present in the Wiggle class. (Maybe it was copied and pasted?)
            • regarding 2: Which "name attribute" does this mean – is it "name attribute" from annots.xml ?
            • regarding 3: Which class over-writes unique_name and what method within IGBStateProvider is doing it?
            Show
            ann.loraine Ann Loraine added a comment - - edited A few more questions: Can you change "When we click on any of the species" to use more precise language? I think it means: "when the user selects a genome version by clicking an image on the IGB start screen" (It's also correct to call the "start screen" a "home screen" if you think that's better.) I'm kind of confused about step (b) above. Is this intending to say that (b) happens after (a) and is also triggered when the user selects a genome version (since it's under heading number 1)? The wording ("All of the symmetry details are loaded") suggests that maybe it's talking about what happens when the user clicks the "Load Data" button? regarding 2: I'm kind of confused also about this sentence: "The files are getting parsed using the same function as mentioned above that is present in Wiggle class." Is the code actually using a Wiggle class function or is it identical to a function also present in the Wiggle class. (Maybe it was copied and pasted?) regarding 2: Which "name attribute" does this mean – is it "name attribute" from annots.xml ? regarding 3: Which class over-writes unique_name and what method within IGBStateProvider is doing it?
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            1. Loading of symmetry is triggered both when the user selects a genome version and when the load button is clicked.
            2. The bedgraph files are getting parsed using the functions written within Wiggle class.
            3. The below function of SymLoader sets the 'method' attribute of DataSet class to graph id.
            public static Map<String, List<? extends SeqSymmetry>> splitFilterAndAddAnnotation(final SeqSpan span, List<? extends SeqSymmetry> results, DataSet feature)

            {...}

            The setMethod(String method) function of DataSet class sets the track name by calling the below function of IGBStateProvider class.
            public ITrackStyleExtended getAnnotStyle(String unique_name, String track_name, String file_type, java.util.Map<String, String> props)

            {..}

            This is how the unique_name is getting over-written by graph id value.

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - 1. Loading of symmetry is triggered both when the user selects a genome version and when the load button is clicked. 2. The bedgraph files are getting parsed using the functions written within Wiggle class. 3. The below function of SymLoader sets the 'method' attribute of DataSet class to graph id. public static Map<String, List<? extends SeqSymmetry>> splitFilterAndAddAnnotation(final SeqSpan span, List<? extends SeqSymmetry> results, DataSet feature) {...} The setMethod(String method) function of DataSet class sets the track name by calling the below function of IGBStateProvider class. public ITrackStyleExtended getAnnotStyle(String unique_name, String track_name, String file_type, java.util.Map<String, String> props) {..} This is how the unique_name is getting over-written by graph id value.
            Show
            noor91zahara Noor Zahara (Inactive) added a comment - Code change can be found below https://bitbucket.org/noorzahara/integrated-genome-browser-local1/branch/IGBF-1933#diff
            Hide
            ann.loraine Ann Loraine added a comment -

            I love it. One line, big effect!

            Nowlan Freese could you do functional review on this?

            If no functional issues are detected, Noor Zahara please rebase on the latest master and submit PR. I will try to merge it ASAP.

            Show
            ann.loraine Ann Loraine added a comment - I love it. One line, big effect! Nowlan Freese could you do functional review on this? If no functional issues are detected, Noor Zahara please rebase on the latest master and submit PR. I will try to merge it ASAP.
            Hide
            nfreese Nowlan Freese added a comment -

            Working correctly, the name stays as the name (and does not change to the file name) once the data is loaded.

            Show
            nfreese Nowlan Freese added a comment - Working correctly, the name stays as the name (and does not change to the file name) once the data is loaded.
            Hide
            prutha Prutha Kulkarni (Inactive) added a comment -

            Noor Zahara, I have tested the functionality and it is working as expected.
            Now, the track name is not changing post "Load Data" click as the feature name is changed to "uri.toString()" in private List<? extends SeqSymmetry> parse(BioSeq seq, BigWigIterator wigIterator) function of the "BigWigSymLoader.java" file.
            Moving the ticket to DONE.

            Show
            prutha Prutha Kulkarni (Inactive) added a comment - Noor Zahara , I have tested the functionality and it is working as expected. Now, the track name is not changing post "Load Data" click as the feature name is changed to "uri.toString()" in private List<? extends SeqSymmetry> parse(BioSeq seq, BigWigIterator wigIterator) function of the "BigWigSymLoader.java" file. Moving the ticket to DONE.
            Hide
            noor91zahara Noor Zahara (Inactive) added a comment -

            Testing Steps

            Load a bigWig file (File > Open File) -> Click on Load Data button (present on right topmost corner of the IGB screen) -> observe the track name in Data Management Table> It should be displayed with the extension i.e. ".bw"

            Show
            noor91zahara Noor Zahara (Inactive) added a comment - Testing Steps Load a bigWig file (File > Open File) -> Click on Load Data button (present on right topmost corner of the IGB screen) -> observe the track name in Data Management Table > It should be displayed with the extension i.e. ".bw"
            Hide
            ann.loraine Ann Loraine added a comment -

            For your reference, here is a link to the commit after PR above was merged:

            Show
            ann.loraine Ann Loraine added a comment - For your reference, here is a link to the commit after PR above was merged: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/847ef52ffef045ca68b056261ff8858918c16df5

              People

              • Assignee:
                noor91zahara Noor Zahara (Inactive)
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: