[IGBF-2909] Determine if and how we can use Nextflow on UNCC HPC cluster - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
3
Epic Link:
Design crispr sites in IGB
Sprint:
Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10, Fall 3 2021 Sep 13 - Sep 24

Description

Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.

For this task, we need to:

Get familiar with the UNC Charlotte compute cluster environment and job schedular
Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
Communicate with support personnnel to ask for new software to be installed, as required

Attachments

Issue Links

relates to

IGBF-2945 Run trimmomatic on HPC system using nextflow and Singularity

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Nowlan Freese added a comment - 08/Sep/21 10:12 AM - edited

Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1)

Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)

Show

Nowlan Freese added a comment - 08/Sep/21 10:12 AM - edited Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1) Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)

Hide

Permalink

Ann Loraine added a comment - 09/Sep/21 6:48 PM - edited

To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session.

Step-by-step instructions:

1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile:

export NXF_EXECUTOR=slurm

2) Install nextflow into your HPC account, following "Step1" from URL (1) above.

3) Create a directory where you want to run nextflow and change into that directory.

4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/.

Start a tmux session, called "nextflow":

tmux new -s nextflow

4) Copy a starter script from link (2) above into that directory.

5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour:

srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash

You will likely observe a message like the following, but your job number will be different:

srun: job 1088586 queued and waiting for resources

Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before.

Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition.

6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section:

time '5m'

A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit.

7) Run the script and observed the output:

[aloraine@str-i1 nextflow]$ nextflow run tutorial.nf 
N E X T F L O W  ~  version 21.04.3
Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6
executor >  slurm (3)
[84/c56487] process > splitLetters       [100%] 1 of 1
[e8/c39201] process > convertToUpper (2) [100%] 2 of 2
HELLO
WORLD!

Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.

Show

Ann Loraine added a comment - 09/Sep/21 6:48 PM - edited To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session. Step-by-step instructions: 1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile: export NXF_EXECUTOR=slurm 2) Install nextflow into your HPC account, following "Step1" from URL (1) above. 3) Create a directory where you want to run nextflow and change into that directory. 4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/ . Start a tmux session, called "nextflow": tmux new -s nextflow 4) Copy a starter script from link (2) above into that directory. 5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour: srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash You will likely observe a message like the following, but your job number will be different: srun: job 1088586 queued and waiting for resources Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before. Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition. 6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section: time '5m' A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit. 7) Run the script and observed the output: [aloraine@str-i1 nextflow]$ nextflow run tutorial.nf N E X T F L O W ~ version 21.04.3 Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6 executor > slurm (3) [84/c56487] process > splitLetters [100%] 1 of 1 [e8/c39201] process > convertToUpper (2) [100%] 2 of 2 HELLO WORLD! Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.

Hide

Permalink

Ann Loraine added a comment - 14/Sep/21 3:47 PM - edited

The starter script, with time directives added:

#!/usr/bin/env nextflow

params.str = 'Hello world!'

process splitLetters {
    time '5m'
    output:
    file 'chunk_*' into letters

    """
    printf '${params.str}' | split -b 6 - chunk_
    """
}


process convertToUpper {
    time '5m'
    
    input:
    file x from letters.flatten()

    output:
    stdout result

    """
    cat $x | tr '[a-z]' '[A-Z]'
    """
}

result.view { it.trim() }

Show

Ann Loraine added a comment - 14/Sep/21 3:47 PM - edited The starter script, with time directives added: #!/usr/bin/env nextflow params.str = 'Hello world!' process splitLetters { time '5m' output: file 'chunk_*' into letters """ printf '${params.str}' | split -b 6 - chunk_ """ } process convertToUpper { time '5m' input: file x from letters.flatten() output: stdout result """ cat $x | tr '[a-z]' '[A-Z]' """ } result.view { it.trim() }

Hide

Permalink

Ann Loraine added a comment - 14/Sep/21 3:49 PM

Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.

Show

Ann Loraine added a comment - 14/Sep/21 3:49 PM Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.

Hide

Permalink

Ann Loraine added a comment - 14/Sep/21 5:00 PM - edited

README and Tips:

Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network.
Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)

Show

Ann Loraine added a comment - 14/Sep/21 5:00 PM - edited README and Tips: Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network. Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)

Hide

Permalink

Nowlan Freese added a comment - 15/Sep/21 4:19 PM - edited

I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster.

Ann's note: Added that final line to the script above.

Show

Nowlan Freese added a comment - 15/Sep/21 4:19 PM - edited I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster. Ann's note: Added that final line to the script above.

People

Assignee:

Ann Loraine

Reporter:

Ann Loraine

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

11/Aug/21 9:36 AM

Updated:

16/Sep/21 11:14 AM

Resolved:

16/Sep/21 11:14 AM