[IGBF-3853] IGB not loading the genome after reenabling the DataProvider - JIRA UNCC

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: 10.1.0
Labels:
None

Story Points:
3
Epic Link:
Add Ensembl REST API to IGB
Sprint:
Summer 6, Summer 7

Description

IGB not loading the genome after reenabling the DataProvider, also when UCSC DataProvider is reenabled it shows an exception in the logs and does not allow the user to edit the DataSources configuration window.

Attachments

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 08/Aug/24 8:32 AM

Investigated the issue, it is happening because of the changes included in this commit: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/a2c9ebd52cb1ccd4105d056b14ce86d89fe40114 when this commit is reverted, it is working properly, working on finding the exact issue and fixing it.

Show

Jaya Sravani Sirigineedi added a comment - 08/Aug/24 8:32 AM Investigated the issue, it is happening because of the changes included in this commit: https://bitbucket.org/lorainelab/integrated-genome-browser/commits/a2c9ebd52cb1ccd4105d056b14ce86d89fe40114 when this commit is reverted, it is working properly, working on finding the exact issue and fixing it.

Hide

Permalink

Nowlan Freese added a comment - 08/Aug/24 9:52 AM - edited

I'm also seeing an error when I try to load genomes provided by UCSC REST.

To reproduce:

Select Ovis aries in the Species dropdown
Select O_aries_Nov_2015 in the Genome Version dropdown
I see the following error in the log and no data is available under Available Data: ERROR - Couldn't find species for version O_aries_Nov_2015

Maybe this is due to some of the species name logic added for Ensembl? I will try some older commits and see where the error first starts and will update this comment.

UPDATE - The error I am seeing above also seems to be due to the ~~IGBF-3780~~ commit. If I go back to the commit before the ~~IGBF-3780~~ was merged into main then I do not see the above error.

Show

Nowlan Freese added a comment - 08/Aug/24 9:52 AM - edited I'm also seeing an error when I try to load genomes provided by UCSC REST. To reproduce: Select Ovis aries in the Species dropdown Select O_aries_Nov_2015 in the Genome Version dropdown I see the following error in the log and no data is available under Available Data: ERROR - Couldn't find species for version O_aries_Nov_2015 Maybe this is due to some of the species name logic added for Ensembl? I will try some older commits and see where the error first starts and will update this comment. UPDATE - The error I am seeing above also seems to be due to the IGBF-3780 commit. If I go back to the commit before the IGBF-3780 was merged into main then I do not see the above error.

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 09/Aug/24 9:15 AM - edited

Fixed the issues, updated code is available at: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853. Please review and let me know if there are any issues. Also, found one more issue while testing this, below are the steps to reproduce it, but this issue is not because of this commit (https://jira.bioviz.org/browse/IGBF-3780), will create a new ticket for this one after discussing it with Nowlan Freese.
Steps to reproduce:

Start IGB.
Select the homo sapiens in the Species dropdown.
Select any of the genome versions available.
Now try to disable the UCSC Rest provider, for most of the genome versions getting the below warning and after that not able to perform any action, receiving the below error.

warning

[714.027s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached.
[714.027s][warning][os,thread] Failed to start the native thread for java.lang.Thread "pool-1973-thread-1"

error

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:1553)
at java.base/java.lang.System$2.start(System.java:2577)
at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
at java.base/java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:754)
at com.affymetrix.genometry.thread.CThreadHolder.execute(CThreadHolder.java:49)
at com.affymetrix.igb.view.load.GeneralLoadUtils.iterateSeqList(GeneralLoadUtils.java:694)

Show

Jaya Sravani Sirigineedi added a comment - 09/Aug/24 9:15 AM - edited Fixed the issues, updated code is available at: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853 . Please review and let me know if there are any issues. Also, found one more issue while testing this, below are the steps to reproduce it, but this issue is not because of this commit ( https://jira.bioviz.org/browse/IGBF-3780 ), will create a new ticket for this one after discussing it with Nowlan Freese . Steps to reproduce: Start IGB. Select the homo sapiens in the Species dropdown. Select any of the genome versions available. Now try to disable the UCSC Rest provider, for most of the genome versions getting the below warning and after that not able to perform any action, receiving the below error. warning [714.027s] [warning] [os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 16k, detached. [714.027s] [warning] [os,thread] Failed to start the native thread for java.lang.Thread "pool-1973-thread-1" error Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:1553) at java.base/java.lang.System$2.start(System.java:2577) at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) at java.base/java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:754) at com.affymetrix.genometry.thread.CThreadHolder.execute(CThreadHolder.java:49) at com.affymetrix.igb.view.load.GeneralLoadUtils.iterateSeqList(GeneralLoadUtils.java:694)

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 12/Aug/24 6:28 PM

After reverting the Ensembl code (branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853-testing-without-ensembl), the issue described in the ticket and the issue in the comment added by Nowlan Freese are not occurring, but the out of memory issue with Homo Sapiens species is still there (see the above comment), as this issue is happening because of the large number of datasets available (for the H_sapiens_Dec_2013 genome version there are around 9,500 datasets) for the Homo Sapiens species for UCSC Rest Provider, I checked it by manually limiting the size of the datasets to 500 and it's working perfectly. Right now, going through the code that is responsible for removing the datasets and trying to optimize it.

Show

Jaya Sravani Sirigineedi added a comment - 12/Aug/24 6:28 PM After reverting the Ensembl code (branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853-testing-without-ensembl ), the issue described in the ticket and the issue in the comment added by Nowlan Freese are not occurring, but the out of memory issue with Homo Sapiens species is still there (see the above comment), as this issue is happening because of the large number of datasets available (for the H_sapiens_Dec_2013 genome version there are around 9,500 datasets) for the Homo Sapiens species for UCSC Rest Provider, I checked it by manually limiting the size of the datasets to 500 and it's working perfectly. Right now, going through the code that is responsible for removing the datasets and trying to optimize it.

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 16/Aug/24 9:52 AM - edited

Found what is causing the out-of-memory issue, the below code in the ThreadUtils.java class creates a new thread for each dataset, ensuring tasks for the same object are executed one after another but possibly using more resources. As it's not reusing the threads it isn't possible to create a new thread if there are a lot of datasets.

public void execute(Object obj, CThreadWorker<?, ?> worker) {
      if (obj == null || worker == null) {
         throw new IllegalArgumentException("None of parameters can be null");
      }
     ThreadUtils.getPrimaryExecutor(obj).execute(worker);
}

public synchronized static Executor getPrimaryExecutor(Object key) {
     Executor exec = obj2exec.get(key);
     if (exec == null) {
        exec = Executors.newSingleThreadExecutor();
        obj2exec.put(key, exec);
     }
     return exec;
}

Changed this code to the below code which uses ExecutorService to manage the threads and their lifecycle, this doesn't give any error but the time taken to remove all the UCSC datasets (around 9,500) is around 4 minutes 30 seconds. That is because a few parts of code require sequential execution, if we want to optimize it, it might take a lot of effort, investigation, and a lot of code change.

public class ExecutorServiceManager {
    private static final int THREAD_POOL_SIZE = Runtime.getRuntime().availableProcessors();
    private static final ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE);

    public static ExecutorService getExecutorService() {
        return executorService;
    }
}
public void executeForRemoveDatasets(Object obj, CThreadWorker<?, ?> worker) {
    if (obj == null || worker == null) {
        throw new IllegalArgumentException("None of parameters can be null");
    }
    ExecutorServiceManager.getExecutorService().execute(worker);
}

As discussed with Nowlan Freese, the best option to resolve the issue at this point is to remove narrowPeak and bigWig file types, as they have a lot of datasets, and this will result in removing the datasets faster without any issues and takes far less time. Here is the updated branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853. Added it as a separate commit to make it more understandable.

Show

Jaya Sravani Sirigineedi added a comment - 16/Aug/24 9:52 AM - edited Found what is causing the out-of-memory issue, the below code in the ThreadUtils.java class creates a new thread for each dataset, ensuring tasks for the same object are executed one after another but possibly using more resources. As it's not reusing the threads it isn't possible to create a new thread if there are a lot of datasets. public void execute( Object obj, CThreadWorker<?, ?> worker) { if (obj == null || worker == null ) { throw new IllegalArgumentException( "None of parameters can be null " ); } ThreadUtils.getPrimaryExecutor(obj).execute(worker); } public synchronized static Executor getPrimaryExecutor( Object key) { Executor exec = obj2exec.get(key); if (exec == null ) { exec = Executors.newSingleThreadExecutor(); obj2exec.put(key, exec); } return exec; } Changed this code to the below code which uses ExecutorService to manage the threads and their lifecycle, this doesn't give any error but the time taken to remove all the UCSC datasets (around 9,500) is around 4 minutes 30 seconds. That is because a few parts of code require sequential execution, if we want to optimize it, it might take a lot of effort, investigation, and a lot of code change. public class ExecutorServiceManager { private static final int THREAD_POOL_SIZE = Runtime .getRuntime().availableProcessors(); private static final ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE); public static ExecutorService getExecutorService() { return executorService; } } public void executeForRemoveDatasets( Object obj, CThreadWorker<?, ?> worker) { if (obj == null || worker == null ) { throw new IllegalArgumentException( "None of parameters can be null " ); } ExecutorServiceManager.getExecutorService().execute(worker); } As discussed with Nowlan Freese , the best option to resolve the issue at this point is to remove narrowPeak and bigWig file types, as they have a lot of datasets, and this will result in removing the datasets faster without any issues and takes far less time. Here is the updated branch: https://bitbucket.org/jaya-sravani/integrated-genome-browser/branch/IGBF-3853 . Added it as a separate commit to make it more understandable.

Hide

Permalink

Nowlan Freese added a comment - 19/Aug/24 11:40 AM - edited

Testing Sravani's branch on Mac.

IGB built with tests.
The bigWig and narrowPeak folders are no longer available under UCSC REST available data.
Other UCSC REST data (RefSeq All) load correctly.
Turning the UCSC REST Data Source off and back on works quickly.
No issues in the log.

Ready for pull request.

Show

Nowlan Freese added a comment - 19/Aug/24 11:40 AM - edited Testing Sravani's branch on Mac. IGB built with tests. The bigWig and narrowPeak folders are no longer available under UCSC REST available data. Other UCSC REST data (RefSeq All) load correctly. Turning the UCSC REST Data Source off and back on works quickly. No issues in the log. Ready for pull request.

Hide

Permalink

Jaya Sravani Sirigineedi added a comment - 19/Aug/24 1:06 PM

Raised the Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/1032

Show

Jaya Sravani Sirigineedi added a comment - 19/Aug/24 1:06 PM Raised the Pull request: https://bitbucket.org/lorainelab/integrated-genome-browser/pull-requests/1032

Hide

Permalink

Ann Loraine added a comment - 20/Aug/24 10:02 AM

PR merged. New installers built and deployed to bioviz.org early access IGB section.

Show

Ann Loraine added a comment - 20/Aug/24 10:02 AM PR merged. New installers built and deployed to bioviz.org early access IGB section.

Hide

Permalink

Nowlan Freese added a comment - 20/Aug/24 11:39 AM - edited

Tested using main branch installer on Mac.

The bigWig and narrowPeak folders are no longer available under UCSC REST available data.
Other UCSC REST data (RefSeq All) load correctly.
Turning the UCSC REST Data Source off and back on works quickly.
Able to load genomes provided by UCSC REST.
No issues in the log.

Closing ticket.

Show

Nowlan Freese added a comment - 20/Aug/24 11:39 AM - edited Tested using main branch installer on Mac. The bigWig and narrowPeak folders are no longer available under UCSC REST available data. Other UCSC REST data (RefSeq All) load correctly. Turning the UCSC REST Data Source off and back on works quickly. Able to load genomes provided by UCSC REST. No issues in the log. Closing ticket.

People

Assignee:

Jaya Sravani Sirigineedi

Reporter:

Jaya Sravani Sirigineedi

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

08/Aug/24 8:21 AM

Updated:

23/Sep/24 3:57 PM

Resolved:

20/Aug/24 11:39 AM