Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3782

Investigate optimizing the loadSupportedGenomeVersions method

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Optimize the loadSupportedGenomeVersions method. As this method loadSupportedGenomeVersions() which is responsible for adding the species to the species list for a given provider is the issue, this has a loop that goes over all the genomeVersions that the provider has in the case of Ensembl it's 33694 so the method is taking a very long time to run, optimize this by introducing multi-threading or any other optimization technique.

        Attachments

          Issue Links

            Activity

            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment -

            Changed the for loop to use multi-threading, only working for two ensembl divisions, right now working on adding the ensembl species to the species.txt file the logic which was included as part of IGBF-3780 only removes the genome versions of a species that are already present but if ensembl has another version then it's getting added.

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - Changed the for loop to use multi-threading, only working for two ensembl divisions, right now working on adding the ensembl species to the species.txt file the logic which was included as part of IGBF-3780 only removes the genome versions of a species that are already present but if ensembl has another version then it's getting added.
            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment -

            Fixed the issues that were found while testing, now user is able to see all the species from Ensembl which aren't loaded by other provider. Changing the for loop to use a parallel stream in the loadSupportedGenomeVersions method didn't optimize anything, I think the underneath methods don't support multi-threading which makes the parallel stream work like a normal loop. Four out of Six divisions are being loaded without any issue without any optimization. Below is the screenshot of the species that are loaded from Ensembl.

            As discussed with Nowlan Freese, will deep-dive into the code and see if there is any part of the code that can be optimized.

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - Fixed the issues that were found while testing, now user is able to see all the species from Ensembl which aren't loaded by other provider. Changing the for loop to use a parallel stream in the loadSupportedGenomeVersions method didn't optimize anything, I think the underneath methods don't support multi-threading which makes the parallel stream work like a normal loop. Four out of Six divisions are being loaded without any issue without any optimization. Below is the screenshot of the species that are loaded from Ensembl. As discussed with Nowlan Freese , will deep-dive into the code and see if there is any part of the code that can be optimized.
            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment -

            Tried different optimizations and changed the way the code is written as part of IGBF-3780, now all the divisions species from Ensembl are being loaded within 30 seconds. But the list of species is so long, as discussed with Nowlan Freese will be treating each division as a separate data provider and enable only EnsemblVertebrates, EnsemblPlants, and EnsemblMetazoa these data providers by default and the rest three would be disabled. This will allow the user to enable the other categories if required. Will be creating tickets for this.

            Next task - Check all the optimization changes that are done and see which changes are useful and whether they are impacting any other part of IGB.

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - Tried different optimizations and changed the way the code is written as part of IGBF-3780 , now all the divisions species from Ensembl are being loaded within 30 seconds. But the list of species is so long, as discussed with Nowlan Freese will be treating each division as a separate data provider and enable only EnsemblVertebrates, EnsemblPlants, and EnsemblMetazoa these data providers by default and the rest three would be disabled. This will allow the user to enable the other categories if required. Will be creating tickets for this. Next task - Check all the optimization changes that are done and see which changes are useful and whether they are impacting any other part of IGB.
            Hide
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited

            The optimization changes that are done in the Ensembl code are working correctly and effectively reducing the loading time but the multi-threading changes which also reduced the load time, are impacting other parts of the code, so reverted those changes. Will push the changes as part of the IGBF-3780 ticket. Closing this ticket.

            Show
            jsirigin Jaya Sravani Sirigineedi added a comment - - edited The optimization changes that are done in the Ensembl code are working correctly and effectively reducing the loading time but the multi-threading changes which also reduced the load time, are impacting other parts of the code, so reverted those changes. Will push the changes as part of the IGBF-3780 ticket. Closing this ticket.
            Hide
            nfreese Nowlan Freese added a comment -

            This ticket investigated optimizing the IGB load time, but we decided not to include any of the changes as they had other effects.

            Show
            nfreese Nowlan Freese added a comment - This ticket investigated optimizing the IGB load time, but we decided not to include any of the changes as they had other effects.

              People

              • Assignee:
                jsirigin Jaya Sravani Sirigineedi
                Reporter:
                jsirigin Jaya Sravani Sirigineedi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: