Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3853

IGB not loading the genome after reenabling the DataProvider

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: 10.1.0
    • Labels:
      None

      Description

      IGB not loading the genome after reenabling the DataProvider, also when UCSC DataProvider is reenabled it shows an exception in the logs and does not allow the user to edit the DataSources configuration window.

        Attachments

          Activity

          jsirigin Jaya Sravani Sirigineedi created issue -
          jsirigin Jaya Sravani Sirigineedi made changes -
          Field Original Value New Value
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          Investigated the issue, it is happening because of the changes included in this commit: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/a2c9ebd52cb1ccd4105d056b14ce86d89fe40114 when this commit is reverted, it is working properly, working on finding the exact issue and fixing it.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - Investigated the issue, it is happening because of the changes included in this commit: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/a2c9ebd52cb1ccd4105d056b14ce86d89fe40114 when this commit is reverted, it is working properly, working on finding the exact issue and fixing it.
          nfreese Nowlan Freese made changes -
          Epic Link IGBF-3555 [ 22774 ]
          Hide
          nfreese Nowlan Freese added a comment - - edited

          I'm also seeing an error when I try to load genomes provided by UCSC REST.

          To reproduce:

          1. Select Ovis aries in the Species dropdown
          2. Select O_aries_Nov_2015 in the Genome Version dropdown
          3. I see the following error in the log and no data is available under Available Data: ERROR - Couldn't find species for version O_aries_Nov_2015

          Maybe this is due to some of the species name logic added for Ensembl? I will try some older commits and see where the error first starts and will update this comment.

          UPDATE - The error I am seeing above also seems to be due to the IGBF-3780 commit. If I go back to the commit before the IGBF-3780 was merged into main then I do not see the above error.

          Show
          nfreese Nowlan Freese added a comment - - edited I'm also seeing an error when I try to load genomes provided by UCSC REST. To reproduce: Select Ovis aries in the Species dropdown Select O_aries_Nov_2015 in the Genome Version dropdown I see the following error in the log and no data is available under Available Data: ERROR - Couldn't find species for version O_aries_Nov_2015 Maybe this is due to some of the species name logic added for Ensembl? I will try some older commits and see where the error first starts and will update this comment. UPDATE - The error I am seeing above also seems to be due to the IGBF-3780 commit. If I go back to the commit before the IGBF-3780 was merged into main then I do not see the above error.
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited

          Fixed the issues, updated code is available at: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853. Please review and let me know if there are any issues. Also, found one more issue while testing this, below are the steps to reproduce it, but this issue is not because of this commit (https://jira.bioviz.org/browse/IGBF-3780), will create a new ticket for this one after discussing it with Nowlan Freese.
          Steps to reproduce:

          • Start IGB.
          • Select the homo sapiens in the Species dropdown.
          • Select any of the genome versions available.
          • Now try to disable the UCSC Rest provider, for most of the genome versions getting the below warning and after that not able to perform any action, receiving the below error.

          warning

          [714.027s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached.
          [714.027s][warning][os,thread] Failed to start the native thread for java.lang.Thread "pool-1973-thread-1"

          error

          Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
          at java.base/java.lang.Thread.start0(Native Method)
          at java.base/java.lang.Thread.start(Thread.java:1553)
          at java.base/java.lang.System$2.start(System.java:2577)
          at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
          at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
          at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
          at java.base/java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:754)
          at com.affymetrix.genometry.thread.CThreadHolder.execute(CThreadHolder.java:49)
          at com.affymetrix.igb.view.load.GeneralLoadUtils.iterateSeqList(GeneralLoadUtils.java:694)

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited Fixed the issues, updated code is available at: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853 . Please review and let me know if there are any issues. Also, found one more issue while testing this, below are the steps to reproduce it, but this issue is not because of this commit ( https://jira.bioviz.org/browse/IGBF-3780 ), will create a new ticket for this one after discussing it with Nowlan Freese . Steps to reproduce: Start IGB. Select the homo sapiens in the Species dropdown. Select any of the genome versions available. Now try to disable the UCSC Rest provider, for most of the genome versions getting the below warning and after that not able to perform any action, receiving the below error. warning [714.027s] [warning] [os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached. [714.027s] [warning] [os,thread] Failed to start the native thread for java.lang.Thread "pool-1973-thread-1" error Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:1553) at java.base/java.lang.System$2.start(System.java:2577) at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) at java.base/java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:754) at com.affymetrix.genometry.thread.CThreadHolder.execute(CThreadHolder.java:49) at com.affymetrix.igb.view.load.GeneralLoadUtils.iterateSeqList(GeneralLoadUtils.java:694)
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          ann.loraine Ann Loraine made changes -
          Sprint Summer 6 [ 200 ] Summer 6, Summer 7 [ 200, 201 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          jsirigin Jaya Sravani Sirigineedi made changes -
          Story Points 1 3
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          After reverting the Ensembl code (branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853-testing-without-ensembl), the issue described in the ticket and the issue in the comment added by Nowlan Freese are not occurring, but the out of memory issue with Homo Sapiens species is still there (see the above comment), as this issue is happening because of the large number of datasets available (for the H_sapiens_Dec_2013 genome version there are around 9,500 datasets) for the Homo Sapiens species for UCSC Rest Provider, I checked it by manually limiting the size of the datasets to 500 and it's working perfectly. Right now, going through the code that is responsible for removing the datasets and trying to optimize it.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - After reverting the Ensembl code (branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853-testing-without-ensembl ), the issue described in the ticket and the issue in the comment added by Nowlan Freese are not occurring, but the out of memory issue with Homo Sapiens species is still there (see the above comment), as this issue is happening because of the large number of datasets available (for the H_sapiens_Dec_2013 genome version there are around 9,500 datasets) for the Homo Sapiens species for UCSC Rest Provider, I checked it by manually limiting the size of the datasets to 500 and it's working perfectly. Right now, going through the code that is responsible for removing the datasets and trying to optimize it.
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited

          Found what is causing the out-of-memory issue, the below code in the ThreadUtils.java class creates a new thread for each dataset, ensuring tasks for the same object are executed one after another but possibly using more resources. As it's not reusing the threads it isn't possible to create a new thread if there are a lot of datasets.

          public void execute(Object obj, CThreadWorker<?, ?> worker) {
                if (obj == null || worker == null) {
                   throw new IllegalArgumentException("None of parameters can be null");
                }
               ThreadUtils.getPrimaryExecutor(obj).execute(worker);
          }
          
          public synchronized static Executor getPrimaryExecutor(Object key) {
               Executor exec = obj2exec.get(key);
               if (exec == null) {
                  exec = Executors.newSingleThreadExecutor();
                  obj2exec.put(key, exec);
               }
               return exec;
          }
          

          Changed this code to the below code which uses ExecutorService to manage the threads and their lifecycle, this doesn't give any error but the time taken to remove all the UCSC datasets (around 9,500) is around 4 minutes 30 seconds. That is because a few parts of code require sequential execution, if we want to optimize it, it might take a lot of effort, investigation, and a lot of code change.

          public class ExecutorServiceManager {
              private static final int THREAD_POOL_SIZE = Runtime.getRuntime().availableProcessors();
              private static final ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
          
              public static ExecutorService getExecutorService() {
                  return executorService;
              }
          }
          public void executeForRemoveDatasets(Object obj, CThreadWorker<?, ?> worker) {
              if (obj == null || worker == null) {
                  throw new IllegalArgumentException("None of parameters can be null");
              }
              ExecutorServiceManager.getExecutorService().execute(worker);
          }
          

          As discussed with Nowlan Freese, the best option to resolve the issue at this point is to remove narrowPeak and bigWig file types, as they have a lot of datasets, and this will result in removing the datasets faster without any issues and takes far less time. Here is the updated branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853. Added it as a separate commit to make it more understandable.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited Found what is causing the out-of-memory issue, the below code in the ThreadUtils.java class creates a new thread for each dataset, ensuring tasks for the same object are executed one after another but possibly using more resources. As it's not reusing the threads it isn't possible to create a new thread if there are a lot of datasets. public void execute( Object obj, CThreadWorker<?, ?> worker) { if (obj == null || worker == null ) { throw new IllegalArgumentException( "None of parameters can be null " ); } ThreadUtils.getPrimaryExecutor(obj).execute(worker); } public synchronized static Executor getPrimaryExecutor( Object key) { Executor exec = obj2exec.get(key); if (exec == null ) { exec = Executors.newSingleThreadExecutor(); obj2exec.put(key, exec); } return exec; } Changed this code to the below code which uses ExecutorService to manage the threads and their lifecycle, this doesn't give any error but the time taken to remove all the UCSC datasets (around 9,500) is around 4 minutes 30 seconds. That is because a few parts of code require sequential execution, if we want to optimize it, it might take a lot of effort, investigation, and a lot of code change. public class ExecutorServiceManager { private static final int THREAD_POOL_SIZE = Runtime .getRuntime().availableProcessors(); private static final ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE); public static ExecutorService getExecutorService() { return executorService; } } public void executeForRemoveDatasets( Object obj, CThreadWorker<?, ?> worker) { if (obj == null || worker == null ) { throw new IllegalArgumentException( "None of parameters can be null " ); } ExecutorServiceManager.getExecutorService().execute(worker); } As discussed with Nowlan Freese , the best option to resolve the issue at this point is to remove narrowPeak and bigWig file types, as they have a lot of datasets, and this will result in removing the datasets faster without any issues and takes far less time. Here is the updated branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853 . Added it as a separate commit to make it more understandable.
          jsirigin Jaya Sravani Sirigineedi made changes -
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          nfreese Nowlan Freese added a comment - - edited

          Testing Sravani's branch on Mac.

          • IGB built with tests.
          • The bigWig and narrowPeak folders are no longer available under UCSC REST available data.
          • Other UCSC REST data (RefSeq All) load correctly.
          • Turning the UCSC REST Data Source off and back on works quickly.
          • No issues in the log.

          Ready for pull request.

          Show
          nfreese Nowlan Freese added a comment - - edited Testing Sravani's branch on Mac. IGB built with tests. The bigWig and narrowPeak folders are no longer available under UCSC REST available data. Other UCSC REST data (RefSeq All) load correctly. Turning the UCSC REST Data Source off and back on works quickly. No issues in the log. Ready for pull request.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Jaya Sravani Sirigineedi [ jsirigin ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - Raised the Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/1032
          jsirigin Jaya Sravani Sirigineedi made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          jsirigin Jaya Sravani Sirigineedi made changes -
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          Hide
          ann.loraine Ann Loraine added a comment -

          PR merged. New installers built and deployed to bioviz.org early access IGB section.

          Show
          ann.loraine Ann Loraine added a comment - PR merged. New installers built and deployed to bioviz.org early access IGB section.
          nfreese Nowlan Freese made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          nfreese Nowlan Freese made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          nfreese Nowlan Freese made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          Hide
          nfreese Nowlan Freese added a comment - - edited

          Tested using main branch installer on Mac.

          • The bigWig and narrowPeak folders are no longer available under UCSC REST available data.
          • Other UCSC REST data (RefSeq All) load correctly.
          • Turning the UCSC REST Data Source off and back on works quickly.
          • Able to load genomes provided by UCSC REST.
          • No issues in the log.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - - edited Tested using main branch installer on Mac. The bigWig and narrowPeak folders are no longer available under UCSC REST available data. Other UCSC REST data (RefSeq All) load correctly. Turning the UCSC REST Data Source off and back on works quickly. Able to load genomes provided by UCSC REST. No issues in the log. Closing ticket.
          nfreese Nowlan Freese made changes -
          Assignee Jaya Sravani Sirigineedi [ jsirigin ]
          nfreese Nowlan Freese made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
          nfreese Nowlan Freese made changes -
          Fix Version/s 10.1.0 [ 11000 ]

            People

            • Assignee:
              jsirigin Jaya Sravani Sirigineedi
              Reporter:
              jsirigin Jaya Sravani Sirigineedi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: