Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:3
-
Epic Link:
-
Sprint:Spring 8 : 24 Apr to 8 May, Spring 8 : 11 May to 25 May, Spring 9 : 25 May to 8 Jun, Summer 1: 8 Jun - 19 Jun, Summer 2: 22 Jun - 3 Jul, Summer 3: 6 Jul - 17 Jul
Description
Situation: Currently, the index hacking data structure is a StringBuilder object (called <output>). As the file is looped through, a new row of data (chromosome, start, stop, value) are appended to the StringBuilder object.
Problem: Each data point in the fourth column (value) needs to be divided by the average value across the entire file. To determine the average, a second loop through the entire file is needed before building the StringBuilder object. This doubles the amount of time to display the file to the user.
Task: Refactor the code to avoid using a StringBuilder object to construct the data row by row. Instead, create an Array/ArrayList for each column of data (chromosome, start, stop, value). Loop through the file appending to each Array. Then determine the average of the value Array, and divide the entire value Array contents by the average value. Then use the four Arrays to create the Sym object.
Note This is an example of how this task could be accomplished. Other data structures may be more efficient. The main objective is to only loop through the file a single time.
For example:
//How file is currently constructed row by row using StringBuilder
chr1 0 16384 14567
chr1 16384 32768 10576
//How file should be constructed using Array/ArrayList for each column (chromosome, start, stop, value)
[chr1, chr1, ...]
[0, 16384, ...]
[16384, 32768, ...]
[14567, 10576, ...] length = 2, average = (14567 + 10576)/length
//Divide the 4th Array values by the average
[14567/average, 10576/average, ...]
Please review my branch: https://bitbucket.org/pnikhare/integrated-genome-browser/branch/IGBF-2354#diff
It includes the changes of
IGBF-2329.Overview:
In loop storing the intermediate data (id, chromosome, start, stop, value) into an array of class object.
Storing the id to mean map
Iterate over the each class object, recalculate the chunk length using id to mean map, and form the String builder object.