Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3808

Explore NFCORE - Nextflow for newer version & pipeline appropros for de novo

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL:
      To explore the latest developments in NFCORE / nextflow and see

      1. What the latest version is
      2. See if there is a different pipeline that is more suitable for de novo contigs read aligning.

        Attachments

          Activity

          Hide
          robofjoy Robert Reid added a comment -

          Ann has done this mostly in ticket

          https://jira.bioviz.org/browse/IGBF-3790

          For NF Core, I do not see any newer versions beyond the V 3.14 that is already in place on the cluster.

          Show
          robofjoy Robert Reid added a comment - Ann has done this mostly in ticket https://jira.bioviz.org/browse/IGBF-3790 For NF Core, I do not see any newer versions beyond the V 3.14 that is already in place on the cluster.
          Hide
          ann.loraine Ann Loraine added a comment -

          I checked the latest list of nf-core piplines here: https://nf-co.re/pipelines/

          Using the search text entry box at the top of the page, I searched the term "transcriptome".

          There is currently a "dev" (under development?) pipeline called "denovotranscript". Its description (as of today) is:

          nf-core/denovotranscript is a bioinformatics pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq. It takes a samplesheet and FASTQ files as input, perfoms quality control (QC), trimming, assembly, redundancy reduction, pseudoalignment, and quantification. It outputs a transcriptome assembly FASTA file, a transcript abundance TSV file, and a MultiQC report with assembly quality and read QC metrics.

          The pipeline can perform transcript assembly using Trinity (not good) or rnaSPAdes (better, as per the latest results from Brandon Bendickson and Robert Reid).

          I would say that this pipeline could be a great option for us.

          I advise the following next steps:

          1) Join the "denovotranscript" Slack channel - see https://github.com/nf-core/denovotranscript/tree/dev for details
          2) Read posts there to get familiar with the channel.
          3) Ask the channel if the pipeline is ready for us to try it, considering that we haven't used it before and are "newbies." Is this "dev" pipeline ready for use by the likes of us?
          4) Put together the various parameter files needed to run the pipelne. Note that for us, we need to make sure that any intron size related parameters, especially the maximum intron size, are set properly. (I'm not sure if this would be an issue for this application because the de novo assemblies do not need to align transcripts onto a genome assembly.)
          5) Run the pipeline.
          6) Assess results by aligning the assembled transcript sequences against a suitable reference genome - SL4 or SL5 if assembling tomato data.

          Show
          ann.loraine Ann Loraine added a comment - I checked the latest list of nf-core piplines here: https://nf-co.re/pipelines/ Using the search text entry box at the top of the page, I searched the term "transcriptome". There is currently a "dev" (under development?) pipeline called "denovotranscript". Its description (as of today) is: nf-core/denovotranscript is a bioinformatics pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq. It takes a samplesheet and FASTQ files as input, perfoms quality control (QC), trimming, assembly, redundancy reduction, pseudoalignment, and quantification. It outputs a transcriptome assembly FASTA file, a transcript abundance TSV file, and a MultiQC report with assembly quality and read QC metrics. The pipeline can perform transcript assembly using Trinity (not good) or rnaSPAdes (better, as per the latest results from Brandon Bendickson and Robert Reid ). I would say that this pipeline could be a great option for us. I advise the following next steps: 1) Join the "denovotranscript" Slack channel - see https://github.com/nf-core/denovotranscript/tree/dev for details 2) Read posts there to get familiar with the channel. 3) Ask the channel if the pipeline is ready for us to try it, considering that we haven't used it before and are "newbies." Is this "dev" pipeline ready for use by the likes of us? 4) Put together the various parameter files needed to run the pipelne. Note that for us, we need to make sure that any intron size related parameters, especially the maximum intron size, are set properly. (I'm not sure if this would be an issue for this application because the de novo assemblies do not need to align transcripts onto a genome assembly.) 5) Run the pipeline. 6) Assess results by aligning the assembled transcript sequences against a suitable reference genome - SL4 or SL5 if assembling tomato data.
          Hide
          ann.loraine Ann Loraine added a comment -

          Request for Robert Reid:

          Please read the comments and close this ticket, if you see fit!

          Show
          ann.loraine Ann Loraine added a comment - Request for Robert Reid : Please read the comments and close this ticket, if you see fit!
          Hide
          robofjoy Robert Reid added a comment -

          This might be worthwhile to explore.
          Definitely worth doing the slack channel component.

          I would call this low priority as it is mostly a repeat of this summer's pipeline which is nearing completion.
          One Huge advantage to getting a nextflow de novo pipeline set up will be the ease of repeating the experiments or running new datasets. So worthwhile.

          Show
          robofjoy Robert Reid added a comment - This might be worthwhile to explore. Definitely worth doing the slack channel component. I would call this low priority as it is mostly a repeat of this summer's pipeline which is nearing completion. One Huge advantage to getting a nextflow de novo pipeline set up will be the ease of repeating the experiments or running new datasets. So worthwhile.

            People

            • Assignee:
              robofjoy Robert Reid
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: