Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2354

Refactor Index Hacking Data Structure

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Spring 8 : 24 Apr to 8 May, Spring 8 : 11 May to 25 May, Spring 9 : 25 May to 8 Jun, Summer 1: 8 Jun - 19 Jun, Summer 2: 22 Jun - 3 Jul, Summer 3: 6 Jul - 17 Jul

      Description

      Situation: Currently, the index hacking data structure is a StringBuilder object (called <output>). As the file is looped through, a new row of data (chromosome, start, stop, value) are appended to the StringBuilder object.

      Problem: Each data point in the fourth column (value) needs to be divided by the average value across the entire file. To determine the average, a second loop through the entire file is needed before building the StringBuilder object. This doubles the amount of time to display the file to the user.

      Task: Refactor the code to avoid using a StringBuilder object to construct the data row by row. Instead, create an Array/ArrayList for each column of data (chromosome, start, stop, value). Loop through the file appending to each Array. Then determine the average of the value Array, and divide the entire value Array contents by the average value. Then use the four Arrays to create the Sym object.

      Note This is an example of how this task could be accomplished. Other data structures may be more efficient. The main objective is to only loop through the file a single time.

      For example:

      //How file is currently constructed row by row using StringBuilder
      chr1 0 16384 14567
      chr1 16384 32768 10576

      //How file should be constructed using Array/ArrayList for each column (chromosome, start, stop, value)
      [chr1, chr1, ...]
      [0, 16384, ...]
      [16384, 32768, ...]
      [14567, 10576, ...] length = 2, average = (14567 + 10576)/length

      //Divide the 4th Array values by the average
      [14567/average, 10576/average, ...]

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: