Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-2909

Determine if and how we can use Nextflow on UNCC HPC cluster

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10, Fall 3 2021 Sep 13 - Sep 24

      Description

      Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.

      For this task, we need to:

      • Get familiar with the UNC Charlotte compute cluster environment and job schedular
      • Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
      • Communicate with support personnnel to ask for new software to be installed, as required

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2883 [ 21320 ]
            ann.loraine Ann Loraine made changes -
            Sprint Fall 1 2021 Aug 16 - Aug 27 [ 127 ] Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Aug 10 [ 127, 128 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Summary Deploy rice salt stress and strain data to RNA-Seq quickload Deploy rice salt stress and strain data
            ann.loraine Ann Loraine made changes -
            Summary Deploy rice salt stress and strain data Re-align rice salt stress RNA-Seq data using hisat
            ann.loraine Ann Loraine made changes -
            Summary Re-align rice salt stress RNA-Seq data using hisat Determine if we can use Nextflow to align RNA-Seq data on cluster
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schelular
            * Understand how various software packages are configured and installed
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schelular
            * Understand how various software packages are configured and installed
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            ann.loraine Ann Loraine made changes -
            Summary Determine if we can use Nextflow to align RNA-Seq data on cluster Determine if and how we can use Nextflow to align RNA-Seq data on cluster
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for alignment RNA-seq data onto a reference genome
            Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for aligning RNA-seq data onto a reference genome
            nfreese Nowlan Freese made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1)

            Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)

            Show
            nfreese Nowlan Freese added a comment - - edited Following instructions here for installing on Mac: https://www.nextflow.io/blog/2021/nextflow-developer-environment.html (1) Following instructions here for writing first scripts: https://www.nextflow.io/docs/latest/getstarted.html#your-first-script (2)
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2935 [ IGBF-2935 ]
            ann.loraine Ann Loraine made changes -
            Comment [ Nextflow scripts can run in lots of different environments, using diverse schedulers. I ran a script on the UNCC HPC queue "Orion" by creating a file named "nextflow.config" in the same directory as my nextflow script.

            nextflow.config:
            {code}
            process {
              executor='slurm'
              queue = 'Orion'
              time = '5m'
            }
            {code}
            I first tried it without the time parameter, but I got an error message. I think you can specify the time variable in the Nextflow script itself. ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session.

            Step-by-step instructions:

            1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile:

            export NXF_EXECUTOR=slurm
            

            2) Install nextflow into your HPC account, following "Step1" from URL (1) above.

            3) Create a directory where you want to run nextflow and change into that directory.

            4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/.

            Start a tmux session, called "nextflow":

            tmux new -s nextflow
            

            4) Copy a starter script from link (2) above into that directory.

            5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour:

            srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash
            

            You will likely observe a message like the following, but your job number will be different:

            srun: job 1088586 queued and waiting for resources
            

            Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before.

            Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition.

            6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section:

            time '5m'
            

            A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit.

            7) Run the script and observed the output:

            [aloraine@str-i1 nextflow]$ nextflow run tutorial.nf 
            N E X T F L O W  ~  version 21.04.3
            Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6
            executor >  slurm (3)
            [84/c56487] process > splitLetters       [100%] 1 of 1
            [e8/c39201] process > convertToUpper (2) [100%] 2 of 2
            HELLO
            WORLD!
            

            Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.

            Show
            ann.loraine Ann Loraine added a comment - - edited To run nextflow on UNCC HPC, you'll need to install it, and then run it as a job, either in an interactive session or a batch session. The following instructions explain how to run it in an interactive session. Step-by-step instructions: 1) Configure your account to use the slurm scheduler with nextflow by adding this line to your .bash_profile: export NXF_EXECUTOR=slurm 2) Install nextflow into your HPC account, following "Step1" from URL (1) above. 3) Create a directory where you want to run nextflow and change into that directory. 4) For the next steps, you need to launch tmux, a terminal multiplexer. This ensures that nextflow will continue running even if you lose your connection. Read these instructions on using tmux: https://www.howtogeek.com/671422/how-to-use-tmux-on-linux-and-why-its-better-than-screen/ . Start a tmux session, called "nextflow": tmux new -s nextflow 4) Copy a starter script from link (2) above into that directory. 5) Launch an interactive node on the cluster, requesting 1 cpu, 1500 mb memory, and 1 hour: srun --partition Orion --job-name "nextflow" --cpus-per-task 1 --mem-per-cpu 1500 --time 1:00:00 --pty bash You will likely observe a message like the following, but your job number will be different: srun: job 1088586 queued and waiting for resources Wait for the "job" to start. When it does, you'll see a new prompt, different from your login session. At this point, you're "in" a new interactive session, running on a newly allocated node, in the same location in the file system as before. Note: The "partition" option specifies which group of machines your session will run in. Different partitions have different usage levels. Besides Orion, you can also use partition "Andromeda", which has relatively lower usage. If it takes too long for the interactive job to start, type CNTR-C to cancel the request and re-enter requesting the Andromeda partition. 6) Edit the starter script to include "time" directives by adding this line to the start of each "process" section: time '5m' A nextflow script can run using diverse execution environments, but some environments require specialized parameters, such as job time limits in the case of HPC environments. In nextflow, each process becomes a job, requiring a time limit. 7) Run the script and observed the output: [aloraine@str-i1 nextflow]$ nextflow run tutorial.nf N E X T F L O W ~ version 21.04.3 Launching `tutorial.nf` [adoring_swirles] - revision: 295ae680a6 executor > slurm (3) [84/c56487] process > splitLetters [100%] 1 of 1 [e8/c39201] process > convertToUpper (2) [100%] 2 of 2 HELLO WORLD! Note: Your output may differ depending on which process was executed first. The output lines may appear in a different order.
            ann.loraine Ann Loraine made changes -
            Sprint Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10 [ 127, 128 ] Fall 1 2021 Aug 16 - Aug 27, Fall 2 2021 Aug 30 - Sep10, Fall 3 2021Sep 13 - Sep 24 [ 127, 128, 129 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Assignee Nowlan Freese [ nfreese ] Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Nowlan Freese [ nfreese ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            The starter script, with time directives added:

            #!/usr/bin/env nextflow
            
            params.str = 'Hello world!'
            
            process splitLetters {
                time '5m'
                output:
                file 'chunk_*' into letters
            
                """
                printf '${params.str}' | split -b 6 - chunk_
                """
            }
            
            
            process convertToUpper {
                time '5m'
                
                input:
                file x from letters.flatten()
            
                output:
                stdout result
            
                """
                cat $x | tr '[a-z]' '[A-Z]'
                """
            }
            
            result.view { it.trim() }
            
            
            Show
            ann.loraine Ann Loraine added a comment - - edited The starter script, with time directives added: #!/usr/bin/env nextflow params.str = 'Hello world!' process splitLetters { time '5m' output: file 'chunk_*' into letters """ printf '${params.str}' | split -b 6 - chunk_ """ } process convertToUpper { time '5m' input: file x from letters.flatten() output: stdout result """ cat $x | tr '[a-z]' '[A-Z]' """ } result.view { it.trim() }
            ann.loraine Ann Loraine made changes -
            Description Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.
            If the environment does not support it for some reason, we can use AWS instead.
            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
            * Communicate with support personnnel to ask for new software to be installed, as required
            * Investigate Nextflow workflows others have published for aligning RNA-seq data onto a reference genome
            Determine if we can use Nextflow workflow management software on the UNC Charlotte compute cluster.

            For this task, we need to:

            * Get familiar with the UNC Charlotte compute cluster environment and job schedular
            * Understand how various software packages are configured and installed - how can we get Nextflow onto the cluster
            * Communicate with support personnnel to ask for new software to be installed, as required
            ann.loraine Ann Loraine made changes -
            Summary Determine if and how we can use Nextflow to align RNA-Seq data on cluster Determine if and how we can use Nextflow on UNCC HPC cluster
            Hide
            ann.loraine Ann Loraine added a comment -

            Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.

            Show
            ann.loraine Ann Loraine added a comment - Discussed installing Nextflow w/ HPC maintainers. Nextflow installation process works well at the level of individual users, so we are going to not install it system wide and let individual users take care of it, for now.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            README and Tips:

            1. Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network.
            2. Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)
            Show
            ann.loraine Ann Loraine added a comment - - edited README and Tips: Don't install on your terminal machine, as installing as advised above requires running code imported via https from a server outside your network. Read the installation script to understand how running it affects the user environment for future logins (e.g., JAVA_HOME or PATH)
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster.

            Ann's note: Added that final line to the script above.

            Show
            nfreese Nowlan Freese added a comment - - edited I made a Google Doc located here with the UNCC HPC tutorial above. I added a few additional steps and fixed an issue with the script (result.view had been omitted). I was then able to get the Hello World example to work on the cluster. Ann's note: Added that final line to the script above.
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-2945 [ IGBF-2945 ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            nfreese Nowlan Freese made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            nfreese Nowlan Freese made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            nfreese Nowlan Freese made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Assignee Ann Loraine [ aloraine ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: