Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2909

Determine if and how we can use Nextflow on UNCC HPC cluster

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10, Fall 3 2021 Sep 13 - Sep 24

      Description

      Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.

      For this task, we need to:

      • Get familiar with the UNC Charlotte compute cluster environment and job schedular
      • Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
      • Communicate with support personnnel to ask for new software to be installed, as required

        Attachments

          Issue Links

            Activity

            Hide
            nfreese Nowlan Freese added a comment - - edited

            Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1)

            Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)

            Show
            nfreese Nowlan Freese added a comment - - edited Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1) Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session.

            Step-by-step instructions:

            1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile:

            export NXF_EXECUTOR=slurm
            

            2) Install nextflow into your HPC account, following "Step1" from URL (1) above.

            3) Create a directory where you want to run nextflow and change into that directory.

            4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/.

            Start a tmux session, called "nextflow":

            tmux new -s nextflow
            

            4) Copy a starter script from link (2) above into that directory.

            5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour:

            srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash
            

            You will likely observe a message like the following, but your job number will be different:

            srun: job 1088586 queued and waiting for resources
            

            Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before.

            Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition.

            6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section:

            time '5m'
            

            A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit.

            7) Run the script and observed the output:

            [aloraine@str-i1 nextflow]$ nextflow run tutorial.nf 
            N E X T F L O W  ~  version 21.04.3
            Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6
            executor >  slurm (3)
            [84/c56487] process > splitLetters       [100%] 1 of 1
            [e8/c39201] process > convertToUpper (2) [100%] 2 of 2
            HELLO
            WORLD!
            

            Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.

            Show
            ann.loraine Ann Loraine added a comment - - edited To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session. Step-by-step instructions: 1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile: export NXF_EXECUTOR=slurm 2) Install nextflow into your HPC account, following "Step1" from URL (1) above. 3) Create a directory where you want to run nextflow and change into that directory. 4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/ . Start a tmux session, called "nextflow": tmux new -s nextflow 4) Copy a starter script from link (2) above into that directory. 5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour: srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash You will likely observe a message like the following, but your job number will be different: srun: job 1088586 queued and waiting for resources Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before. Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition. 6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section: time '5m' A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit. 7) Run the script and observed the output: [aloraine@str-i1 nextflow]$ nextflow run tutorial.nf N E X T F L O W ~ version 21.04.3 Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6 executor > slurm (3) [84/c56487] process > splitLetters [100%] 1 of 1 [e8/c39201] process > convertToUpper (2) [100%] 2 of 2 HELLO WORLD! Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            The starter script, with time directives added:

            #!/usr/bin/env nextflow
            
            params.str = 'Hello world!'
            
            process splitLetters {
                time '5m'
                output:
                file 'chunk_*' into letters
            
                """
                printf '${params.str}' | split -b 6 - chunk_
                """
            }
            
            
            process convertToUpper {
                time '5m'
                
                input:
                file x from letters.flatten()
            
                output:
                stdout result
            
                """
                cat $x | tr '[a-z]' '[A-Z]'
                """
            }
            
            result.view { it.trim() }
            
            
            Show
            ann.loraine Ann Loraine added a comment - - edited The starter script, with time directives added: #!/usr/bin/env nextflow params.str = 'Hello world!' process splitLetters { time '5m' output: file 'chunk_*' into letters """ printf '${params.str}' | split -b 6 - chunk_ """ } process convertToUpper { time '5m' input: file x from letters.flatten() output: stdout result """ cat $x | tr '[a-z]' '[A-Z]' """ } result.view { it.trim() }
            Hide
            ann.loraine Ann Loraine added a comment -

            Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.

            Show
            ann.loraine Ann Loraine added a comment - Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            README and Tips:

            1. Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network.
            2. Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)
            Show
            ann.loraine Ann Loraine added a comment - - edited README and Tips: Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network. Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)
            Hide
            nfreese Nowlan Freese added a comment - - edited

            I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster.

            Ann's note: Added that final line to the script above.

            Show
            nfreese Nowlan Freese added a comment - - edited I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster. Ann's note: Added that final line to the script above.

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: