Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3143

Run RNA-Seq data processing pipeline on positive splicing control and experimental samples

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      Summer 4 2022 July 4, Summer 5 2022 July 18, Summer 6 2022 Aug 1, Fall 1 2022 Aug 15

      Description

      Data sets to process:

      Positive control: SRP328042 Data are published in this article.
      Experimental: SRP252265

      To-Do:

      • Obtain data in fastq format from Sequence Read Archive using fasterqdump options for paired end data - DONE
      • Please data into directories named for the SRP number, e.g., SRP328042 and SRP252265 within a directory named "alt_splicing" under "nobackup" - DONE
      • Make a note of the particular commands used to perform the data retrieval (see comment below)
      • Create "samples" text file listing the SRR fastq files for running nf-core/rna-seq nextflow
      • Run nf-core/rnaseq using proper maximum intron size parameter using "tomato.config"

      Notes:

      Methods used to create positive control RNA-Seq data from SRP328042, according to the paper:

      2.5.2. Preparation of RNA-Seq Library and Sequencing Total RNA was extracted utilizing Trizol reagent (Invitrogen, Waltham, MA, USA). RNA quantity and quality were determined by NanoDrop 1000 spectrophotometer (Thermo Scientific Inc., Waltham, MA, USA), 1% agarose gel electrophoresis and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). Following the protocol described by [28], strand-specific RNA-Seq libraries from 3 biological replicates for each group from WW and DS anthers were prepared using 1 ng/µL of total RNA sample and sequenced by Novogene Biotech (Beijing, China) on Illumina HiSeq 4000 system (Illumina, Inc., San Diego, CA, USA) according to the manufacturer’s instructions. The raw sequence reads were deposited into NCBI Sequence Read Archive under accession the number PRJNA746070.

      A PDF copy of the protocol paper (reference 28) for RNA-Seq library synthesis is attached.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Example data retrieval command sequence:

            /Users/mollydavis333/Desktop/sratoolkit.3.0.0-mac64/bin/fasterq-dump -S SRR15111745
            rsync --progress /Users/mollydavis333/Desktop/SRR15111737_1.fastq  mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing
            
            Show
            ann.loraine Ann Loraine added a comment - Example data retrieval command sequence: /Users/mollydavis333/Desktop/sratoolkit.3.0.0-mac64/bin/fasterq-dump -S SRR15111745 rsync --progress /Users/mollydavis333/Desktop/SRR15111737_1.fastq mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Update:

            • SRR files are downloaded into /nobackup/tomato_genome/alt_splicing
            • SRR files are in two directories SRP252265 (positive control) and SRP328042 (experiment pollen tube samples)

            To-do:

            • Create and review samples file (used by nextflow to run the pipeline) - in progress
            • Get tomato configuration file needed for nf-core/rnaseq with nextflow version 3.4 - DONE (see https://bitbucket.org/hotpollen/flavonoid-rnaseq and Ann's fork of same)
            • Get (deploy) new tomato genome file from IGB Quickload - DONE
            • Explain how to run nextflow - in progress
            • Create GTF file for new genome models - DONE
            Show
            ann.loraine Ann Loraine added a comment - - edited Update: SRR files are downloaded into /nobackup/tomato_genome/alt_splicing SRR files are in two directories SRP252265 (positive control) and SRP328042 (experiment pollen tube samples) To-do: Create and review samples file (used by nextflow to run the pipeline) - in progress Get tomato configuration file needed for nf-core/rnaseq with nextflow version 3.4 - DONE (see https://bitbucket.org/hotpollen/flavonoid-rnaseq and Ann's fork of same) Get (deploy) new tomato genome file from IGB Quickload - DONE Explain how to run nextflow - in progress Create GTF file for new genome models - DONE
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Forked flavonoid-rnaseq into Ann's bitbucket account and cloned the fork onto cluster to: /nobackup/tomato_genome/scripts/flavonoid-rnaseq

            Created a branch IGBF-3143 for this alt splicing work.

            Modified "doIt.sh" script to use star/salmon alignment option.

            Added documentation in the comments.

            Started long-lived interactive session and loaded nf-core with module load command.

            Downloaded nf-core/rnaseq v. 3.4 to: /nobackup/tomato_genome/scripts/nf-core-rnaseq-3.4 and saved singularity images to: /nobackup/tomato_genome/scripts/nxf_singularity_cachedir

            Made gene model GTF format file using bed2gtf.py from https://bitbucket.org/lorainelab/genomesource/src/master/, using option "-g" to indicate that the gene name is in field 13 of input bed-detail file from http://lorainelab-quickload.scidas.org/quickload/S_lycopersicum_Jun_2022/S_lycopersicum_Jun_2022.bed.gz, revision 164 of the file, from repository browsable at https://svn.bioviz.org/viewvc/genomes/quickload/S_lycopersicum_Jun_2022/.

            Show
            ann.loraine Ann Loraine added a comment - - edited Forked flavonoid-rnaseq into Ann's bitbucket account and cloned the fork onto cluster to: /nobackup/tomato_genome/scripts/flavonoid-rnaseq Created a branch IGBF-3143 for this alt splicing work. Modified "doIt.sh" script to use star/salmon alignment option. Added documentation in the comments. Started long-lived interactive session and loaded nf-core with module load command. Downloaded nf-core/rnaseq v. 3.4 to: /nobackup/tomato_genome/scripts/nf-core-rnaseq-3.4 and saved singularity images to: /nobackup/tomato_genome/scripts/nxf_singularity_cachedir Made gene model GTF format file using bed2gtf.py from https://bitbucket.org/lorainelab/genomesource/src/master/ , using option "-g" to indicate that the gene name is in field 13 of input bed-detail file from http://lorainelab-quickload.scidas.org/quickload/S_lycopersicum_Jun_2022/S_lycopersicum_Jun_2022.bed.gz , revision 164 of the file, from repository browsable at https://svn.bioviz.org/viewvc/genomes/quickload/S_lycopersicum_Jun_2022/ .
            Hide
            ann.loraine Ann Loraine added a comment -

            First run showed problem with sample sheet:

            Error executing process > 'NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)'

            Caused by:
            Process `NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)` terminated with an error exit status (127)

            Command executed:

            check_samplesheet.py \
            SRP328042.csv \
            samplesheet.valid.csv

            cat <<-END_VERSIONS > versions.yml
            SAMPLESHEET_CHECK:
            python: $(python --version | sed 's/Python //g')
            END_VERSIONS

            Show
            ann.loraine Ann Loraine added a comment - First run showed problem with sample sheet: Error executing process > 'NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)' Caused by: Process `NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)` terminated with an error exit status (127) Command executed: check_samplesheet.py \ SRP328042.csv \ samplesheet.valid.csv cat <<-END_VERSIONS > versions.yml SAMPLESHEET_CHECK: python: $(python --version | sed 's/Python //g') END_VERSIONS
            Hide
            ann.loraine Ann Loraine added a comment -

            Sample sheeted "strandedness" column contains an invalid value: "stranded." Documentation mentions only three values are allowed:

            Must be one of unstranded, forward or reverse.

            Also, it appears that our sample files need to be "gzipped":

            File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".

            [~molly] and [~RobertReid] - could you "gzip" the fastq files? Also, kindly modify the "strandedness" field to one of the three options listed.

            Show
            ann.loraine Ann Loraine added a comment - Sample sheeted "strandedness" column contains an invalid value: "stranded." Documentation mentions only three values are allowed: Must be one of unstranded, forward or reverse. Also, it appears that our sample files need to be "gzipped": File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". [~molly] and [~RobertReid] - could you "gzip" the fastq files? Also, kindly modify the "strandedness" field to one of the three options listed.
            Hide
            ann.loraine Ann Loraine added a comment -

            Looks like the positive control is indeed stranded, but I'm not sure whether we should "reverse" or "forward" option. I am looking into it....
            attn: [~RobertReid] and [~molly]

            Show
            ann.loraine Ann Loraine added a comment - Looks like the positive control is indeed stranded, but I'm not sure whether we should "reverse" or "forward" option. I am looking into it.... attn: [~RobertReid] and [~molly]
            Hide
            ann.loraine Ann Loraine added a comment -

            I think we need to use "reverse" in the "strandedness" column because we actually end up sequencing the reverse (antisense) strand of the cDNA that gets made from the RNA. What happens with this protocol is that you start out by synthesizing cDNA from mRNA. The first strand synthesis step creates an anti-sense copy of the RNA, complementary to the original mRNA. That step uses regular nucleotides. The second strand synthesis step, which re-reproduces the original mRNA, using the first strand synthesis product as a template, uses dUTP in place of dTTP, which ultimately gets degraded after all the primer ligation steps happen. So, when you go to sequence the library, the only DNA you can sequence came from the first strand synthesis, which was the "reverse" strand of the original RNA. I think this is what the pipeline authors meant to intend by asking for "reverse" or "forward" in the sample sheet.

            However, if this is wrong, the pipeline will let us know. It includes a step that looks at strandedness. So I think we can just go ahead and insert "reverse" in the sample sheet and see what happens.

            attn: [~RobertReid] and [~molly]

            Show
            ann.loraine Ann Loraine added a comment - I think we need to use "reverse" in the "strandedness" column because we actually end up sequencing the reverse (antisense) strand of the cDNA that gets made from the RNA. What happens with this protocol is that you start out by synthesizing cDNA from mRNA. The first strand synthesis step creates an anti-sense copy of the RNA, complementary to the original mRNA. That step uses regular nucleotides. The second strand synthesis step, which re-reproduces the original mRNA, using the first strand synthesis product as a template, uses dUTP in place of dTTP, which ultimately gets degraded after all the primer ligation steps happen. So, when you go to sequence the library, the only DNA you can sequence came from the first strand synthesis, which was the "reverse" strand of the original RNA. I think this is what the pipeline authors meant to intend by asking for "reverse" or "forward" in the sample sheet. However, if this is wrong, the pipeline will let us know. It includes a step that looks at strandedness. So I think we can just go ahead and insert "reverse" in the sample sheet and see what happens. attn: [~RobertReid] and [~molly]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Everything has been properly updated and I ran the pipeline again, making out.3.txt and err.3.txt.

            However, a new error happened:

            Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT'

            Caused by:
            Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT` terminated with an error exit status (127)

            Command executed:

            salmon_tximport.r \
            NULL \
            salmon \
            salmon.merged

            cat <<-END_VERSIONS > versions.yml
            SALMON_TXIMPORT:
            r-base: $(echo $(R --version 2>&1) | sed 's/^.R version //; s/ .$//')
            bioconductor-tximeta: $(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))")
            END_VERSIONS

            Command exit status:
            127

            Command output:
            (empty)

            Command error:
            INFO: Converting SIF file to temporary sandbox...
            WARNING: Skipping mount /usr/local/singularity/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
            .command.sh: line 3: salmon_tximport.r: command not found
            INFO: Cleaning up image...

            Work dir:
            /nobackup/tomato_genome/alt_splicing/SRP328042/work/f1/9c6b783f8bafba25eb2aee017dfb65

            Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

            Show
            ann.loraine Ann Loraine added a comment - - edited Everything has been properly updated and I ran the pipeline again, making out.3.txt and err.3.txt. However, a new error happened: Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT' Caused by: Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT` terminated with an error exit status (127) Command executed: salmon_tximport.r \ NULL \ salmon \ salmon.merged cat <<-END_VERSIONS > versions.yml SALMON_TXIMPORT: r-base: $(echo $(R --version 2>&1) | sed 's/^. R version //; s/ . $//') bioconductor-tximeta: $(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))") END_VERSIONS Command exit status: 127 Command output: (empty) Command error: INFO: Converting SIF file to temporary sandbox... WARNING: Skipping mount /usr/local/singularity/var/singularity/mnt/session/etc/resolv.conf [files] : /etc/resolv.conf doesn't exist in container .command.sh: line 3: salmon_tximport.r: command not found INFO: Cleaning up image... Work dir: /nobackup/tomato_genome/alt_splicing/SRP328042/work/f1/9c6b783f8bafba25eb2aee017dfb65 Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
            Hide
            ann.loraine Ann Loraine added a comment -

            I'm confused about why the above worked at all...I didn't start the nf-core virtual environment before attempting to re-run the pipeline. I am trying again with

            ./doIt.sh SRP328042.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.4.txt 2> err.4.txt 
            
            Show
            ann.loraine Ann Loraine added a comment - I'm confused about why the above worked at all...I didn't start the nf-core virtual environment before attempting to re-run the pipeline. I am trying again with ./doIt.sh SRP328042.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.4.txt 2> err.4.txt
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Same error occurred. Ann, Rob, and Molly met via zoom to trouble-shoot. Rob suggested trying to run the pipeline using the older configuration and older data, since that worked. Rob & Molly proceeding with trouble-shooting as I am out of ideas!

            Show
            ann.loraine Ann Loraine added a comment - - edited Same error occurred. Ann, Rob, and Molly met via zoom to trouble-shoot. Rob suggested trying to run the pipeline using the older configuration and older data, since that worked. Rob & Molly proceeding with trouble-shooting as I am out of ideas!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow nf-core/rnaseq pipeline update:

            Directory Location: /nobackup/tomato_genome/alt_splicing/SRP252265

            Dr. Reid and I successfully ran the pipeline with SRP252265 for out.4.txt:

            [nf-core/rnaseq] Pipeline completed successfully
            Completed at: 02-Aug-2022 16:48:38
            Duration : 1m 38s
            CPU hours : 481.7 (100% cached)
            Succeeded : 2
            Cached : 828_

            Solution: Added module load singularity to steps and removed the symbolic link for nf-core-rnaseq-3.4 and directly downloaded the files to the directory while not in an interactive session:

            module load nf-core
            nf-core download rnaseq -r 3.4

            The only symbolic links present were for the file resources:

            S_lycopersicum_Jun_2022.bed
            S_lycopersicum_Jun_2022.fa
            S_lycopersicum_Jun_2022.gtf
            tomato.config
            doIt.sh

            Steps for successful run:

            1. Add .bash_profile configurations
            2. Start a tmux session (tmux new -s base)
            3. Start an interactive session (srun --partition Andromeda --cpus-per-task 16 --mem-per-cpu 12000 --time 60:00:00 --pty bash)
            4. Activate virtual environment (module load nf-core, module load singularity)
            5. Start Nextflow (./doIt.sh SRP252265.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.0.txt 2> err.0.txt )
            6. Check out file and results directory

            Possible error to arise:
            Error executing process > 'NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)'

            Caused by:
            Process `NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1)

            Work dir:
            /nobackup/tomato_genome/alt_splicing/SRP252265/work/1e/a6b39f6fb5d57e905cdc43045523cd

            Line 52 in versions file:
            TRIMGALORE:
            trimgalore: perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory

            Discussion: The following error is for a file that contains all of the versions of modules that were used during the pipeline process. The error line 52 in the versions file was for trimgalore. Trimgalore did successfully work with the use of the pipelines resources but on the cluster the module was nonexistent. So the process worked and the results were still there after it finished it was just informing us that the version type couldn't be found for the file to record.

            We also created a script to use for easier use (still in the works to be seen as successful):

            #!/bin/bash
            
            #SBATCH --time=296:30:00
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=24
            #SBATCH --mem=401gb
            #SBATCH --job-name=nnnnfcore-splice
            #SBATCH --partition=Draco
            #SBATCH --output=rrr-%x.%j.out
            #SBATCH --error=rrr-%x.%j.err
            #SBATCH --mail-type=END,FAIL
            #SBATCH --mail-user=mdavi258@uncc.edu
            
            umask 007
            set -eu
            
            #file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/rnaseq-phase1/trinity/names.txt)
            #dir=$(sed -n -e "${PBS_ARRAYID}p" /lustre/groups/lorainelab/data/illumina/sweet_potato/filtered/dir.txt)
            
            echo "Launching Nextflow NF-Core"
            
            #module load java8 
            module load singularity
            module load nf-core
            
            ##Always use slurm
            export NXF_EXECUTOR=slurm
            ##Save Singularity containers in home dir
            export NXF_SINGULARITY_CAHCEDIR="$HOME/nxf"
            ## Tell nextflow not to use internet
            export NXF_OFFLINE='TRUE'
            ## Control Java Heap Size
            export NXF_OPTS="-Xms2g -Xmx8g"
            
            ###Change to correct directory with fastq and symbolic link files
            cd /nobackup/tomato_genome/alt_splicing/SRP328042-molly
            
            #makeblastdb -in  /projects/tomato_genome/db/reference-4.0/ITAG4.0_cDNA.fasta -dbtype nucl
            
            #nextflow run nf-core/rnaseq –profile ../../test,singularity
            
            #/nobackup/tomato_genome/nfcore_rnaseq/tomato.config
            
            
            
            ###Change to correct names in directory
            nextflow run nf-core-rnaseq-3.4/workflow \
                     -resume \
                     -profile singularity \
                     -c tomato.config \
                     --aligner star_salmon \
                     --save_trimmed \
                     --fasta S_lycopersicum_Jun_2022.fa \
                     --input SRP328042.csv \
                     --gtf S_lycopersicum_Jun_2022.gtf  \
                     --gene_bed S_lycopersicum_Jun_2022.bed \
                     --skip_biotype_qc \
                     --skip_markduplicates \
                     --skip_bigwig \
                     --skip_stringtie \
                     --skip_qualimap \
                     --skip_fastqc
            
            
            echo "Nextflow Pipeline Finished"
            

            Troubleshooting:

            • Need to figure out if it would work better in Andromeda than Draco partition.
            • Using the script seems so have more random errors that do not make sense than using the srun which feels like a more organized and precise system to use.
            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow nf-core/rnaseq pipeline update: Directory Location: /nobackup/tomato_genome/alt_splicing/SRP252265 Dr. Reid and I successfully ran the pipeline with SRP252265 for out.4.txt : [nf-core/rnaseq] Pipeline completed successfully Completed at: 02-Aug-2022 16:48:38 Duration : 1m 38s CPU hours : 481.7 (100% cached) Succeeded : 2 Cached : 828_ Solution : Added module load singularity to steps and removed the symbolic link for nf-core-rnaseq-3.4 and directly downloaded the files to the directory while not in an interactive session: module load nf-core nf-core download rnaseq -r 3.4 The only symbolic links present were for the file resources: S_lycopersicum_Jun_2022.bed S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf tomato.config doIt.sh Steps for successful run: Add .bash_profile configurations Start a tmux session (tmux new -s base) Start an interactive session (srun --partition Andromeda --cpus-per-task 16 --mem-per-cpu 12000 --time 60:00:00 --pty bash) Activate virtual environment (module load nf-core, module load singularity) Start Nextflow (./doIt.sh SRP252265.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.0.txt 2> err.0.txt ) Check out file and results directory Possible error to arise: Error executing process > 'NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)' Caused by: Process `NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1) Work dir: /nobackup/tomato_genome/alt_splicing/SRP252265/work/1e/a6b39f6fb5d57e905cdc43045523cd Line 52 in versions file: TRIMGALORE: trimgalore: perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory Discussion: The following error is for a file that contains all of the versions of modules that were used during the pipeline process. The error line 52 in the versions file was for trimgalore. Trimgalore did successfully work with the use of the pipelines resources but on the cluster the module was nonexistent. So the process worked and the results were still there after it finished it was just informing us that the version type couldn't be found for the file to record. We also created a script to use for easier use (still in the works to be seen as successful): #!/bin/bash #SBATCH --time=296:30:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=24 #SBATCH --mem=401gb #SBATCH --job-name=nnnnfcore-splice #SBATCH --partition=Draco #SBATCH --output=rrr-%x.%j.out #SBATCH --error=rrr-%x.%j.err #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=mdavi258@uncc.edu umask 007 set -eu #file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/rnaseq-phase1/trinity/names.txt) #dir=$(sed -n -e "${PBS_ARRAYID}p" /lustre/groups/lorainelab/data/illumina/sweet_potato/filtered/dir.txt) echo "Launching Nextflow NF-Core" #module load java8 module load singularity module load nf-core ##Always use slurm export NXF_EXECUTOR=slurm ##Save Singularity containers in home dir export NXF_SINGULARITY_CAHCEDIR= "$HOME/nxf" ## Tell nextflow not to use internet export NXF_OFFLINE='TRUE' ## Control Java Heap Size export NXF_OPTS= "-Xms2g -Xmx8g" ###Change to correct directory with fastq and symbolic link files cd /nobackup/tomato_genome/alt_splicing/SRP328042-molly #makeblastdb -in /projects/tomato_genome/db/reference-4.0/ITAG4.0_cDNA.fasta -dbtype nucl #nextflow run nf-core/rnaseq –profile ../../test,singularity #/nobackup/tomato_genome/nfcore_rnaseq/tomato.config ###Change to correct names in directory nextflow run nf-core-rnaseq-3.4/workflow \ -resume \ -profile singularity \ -c tomato.config \ --aligner star_salmon \ --save_trimmed \ --fasta S_lycopersicum_Jun_2022.fa \ --input SRP328042.csv \ --gtf S_lycopersicum_Jun_2022.gtf \ --gene_bed S_lycopersicum_Jun_2022.bed \ --skip_biotype_qc \ --skip_markduplicates \ --skip_bigwig \ --skip_stringtie \ --skip_qualimap \ --skip_fastqc echo "Nextflow Pipeline Finished" Troubleshooting: Need to figure out if it would work better in Andromeda than Draco partition. Using the script seems so have more random errors that do not make sense than using the srun which feels like a more organized and precise system to use.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Nextflow nf-core/rnaseq pipeline update:

            SRP328042 successfully ran through the nf-core pipeline.
            Working Directory:
            /nobackup/tomato_genome/alt_splicing/SRP328042-molly

            Troubleshooting: I skipped custom_dumpsoftwareversions in the doIt.sh script:
            nextflow run nf-core-rnaseq-3.4/workflow \
            -resume \
            -profile singularity \
            -c $CONFIG \
            --aligner star_salmon \
            --save_trimmed \
            --fasta $GENOMEFASTA \
            --input $SAMPLESHEET \
            --gtf $GTF \
            --gene_bed $GENEBED \
            --skip_biotype_qc \
            --skip_markduplicates \
            --skip_bigwig \
            --skip_stringtie \
            --skip_qualimap \
            --skip_fastqc \
            --skip_custom_dumpsoftwareversions

            Show
            Mdavis4290 Molly Davis added a comment - Nextflow nf-core/rnaseq pipeline update: SRP328042 successfully ran through the nf-core pipeline. Working Directory: /nobackup/tomato_genome/alt_splicing/SRP328042-molly Troubleshooting: I skipped custom_dumpsoftwareversions in the doIt.sh script: nextflow run nf-core-rnaseq-3.4/workflow \ -resume \ -profile singularity \ -c $CONFIG \ --aligner star_salmon \ --save_trimmed \ --fasta $GENOMEFASTA \ --input $SAMPLESHEET \ --gtf $GTF \ --gene_bed $GENEBED \ --skip_biotype_qc \ --skip_markduplicates \ --skip_bigwig \ --skip_stringtie \ --skip_qualimap \ --skip_fastqc \ --skip_custom_dumpsoftwareversions
            Hide
            ann.loraine Ann Loraine added a comment -

            During our meeting today, we looked over the files. Everything looks great, so moving this ticket to Done.

            Summary:

            Data processing output files are on the HPC cluster here:

            • /nobackup/tomato_genome/alt_splicing/SRP328042-molly
            • /nobackup/tomato_genome/alt_splicing/SRP252265
            Show
            ann.loraine Ann Loraine added a comment - During our meeting today, we looked over the files. Everything looks great, so moving this ticket to Done. Summary: Data processing output files are on the HPC cluster here: /nobackup/tomato_genome/alt_splicing/SRP328042-molly /nobackup/tomato_genome/alt_splicing/SRP252265
            Hide
            ann.loraine Ann Loraine added a comment -

            Planning to align data from two new possible possible controls: SRP100604 and SRP268884

            Show
            ann.loraine Ann Loraine added a comment - Planning to align data from two new possible possible controls: SRP100604 and SRP268884

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: