[IGBF-3143] Run RNA-Seq data processing pipeline on positive splicing control and experimental samples - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
3
Epic Link:
Support NSF pollen grant
Sprint:
Summer 4 2022 July 4, Summer 5 2022 July 18, Summer 6 2022 Aug 1, Fall 1 2022 Aug 15

Description

Data sets to process:

Positive control: SRP328042 Data are published in this article.
Experimental: SRP252265

To-Do:

Obtain data in fastq format from Sequence Read Archive using fasterqdump options for paired end data - DONE
Please data into directories named for the SRP number, e.g., SRP328042 and SRP252265 within a directory named "alt_splicing" under "nobackup" - DONE
Make a note of the particular commands used to perform the data retrieval (see comment below)
Create "samples" text file listing the SRR fastq files for running nf-core/rna-seq nextflow
Run nf-core/rnaseq using proper maximum intron size parameter using "tomato.config"

Notes:

Experimental datasets originally processed using code in https://bitbucket.org/hotpollen/rna-seq/src/master/ and https://bitbucket.org/hotpollen/flavonoid-rnaseq
All Pollen project datasets are now in the SRA under the same project number ! (SRP252265)
Ann is using a fork of flavonoid-rnaseq for all new code she's writing, on branch ~~IGBF-3143~~. To find her fork, go to https://bitbucket.org/hotpollen/flavonoid-rnaseq and select "forks"
Documentation for the pipeline we are using is here: https://nf-co.re/rnaseq/3.4/usage

Methods used to create positive control RNA-Seq data from SRP328042, according to the paper:

2.5.2. Preparation of RNA-Seq Library and Sequencing Total RNA was extracted utilizing Trizol reagent (Invitrogen, Waltham, MA, USA). RNA quantity and quality were determined by NanoDrop 1000 spectrophotometer (Thermo Scientific Inc., Waltham, MA, USA), 1% agarose gel electrophoresis and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). Following the protocol described by [28], strand-specific RNA-Seq libraries from 3 biological replicates for each group from WW and DS anthers were prepared using 1 ng/µL of total RNA sample and sequenced by Novogene Biotech (Beijing, China) on Illumina HiSeq 4000 system (Illumina, Inc., San Diego, CA, USA) according to the manufacturer’s instructions. The raw sequence reads were deposited into NCBI Sequence Read Archive under accession the number PRJNA746070.

A PDF copy of the protocol paper (reference 28) for RNA-Seq library synthesis is attached.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

10.1.1.1052.3871.pdf
393 kB
25/Jul/22 1:39 PM

Issue Links

blocks

IGBF-3144 Make annots.xml for RNA-Seq junction, alignment, and coverage graphs

Closed

IGBF-3165 Make junction files for experimental and positive control alignments data

Closed

is blocked by

IGBF-3127 Version control nf-core configuration files used for rna-seq analysis

Closed

relates to

IGBF-2970 Re-run nf-core/rnaseq using proper strand designation and better sample prefix

Closed

IGBF-3127 Version control nf-core configuration files used for rna-seq analysis

Closed

IGBF-3162 Generate scaled coverage graphs for RNA-Seq alignments

Closed

IGBF-3228 Download and process data for SRP100604 and SRP268884

Closed

IGBF-2947 Investigate using nf-core/rnaseq pipeline

Closed

IGBF-3135 Add new tomato genome and annotations to IGB Quickload repository

Closed

Show 4 more links (4 relates to)

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ann Loraine added a comment - 07/Jul/22 12:32 PM

Example data retrieval command sequence:

/Users/mollydavis333/Desktop/sratoolkit.3.0.0-mac64/bin/fasterq-dump -S SRR15111745
rsync --progress /Users/mollydavis333/Desktop/SRR15111737_1.fastq  mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing

Show

Ann Loraine added a comment - 07/Jul/22 12:32 PM Example data retrieval command sequence: /Users/mollydavis333/Desktop/sratoolkit.3.0.0-mac64/bin/fasterq-dump -S SRR15111745 rsync --progress /Users/mollydavis333/Desktop/SRR15111737_1.fastq mdavi258@hpc.uncc.edu:/nobackup/tomato_genome/alt_splicing

Hide

Permalink

Ann Loraine added a comment - 20/Jul/22 11:07 AM - edited

Update:

SRR files are downloaded into /nobackup/tomato_genome/alt_splicing
SRR files are in two directories SRP252265 (positive control) and SRP328042 (experiment pollen tube samples)

To-do:

Create and review samples file (used by nextflow to run the pipeline) - in progress
Get tomato configuration file needed for nf-core/rnaseq with nextflow version 3.4 - DONE (see https://bitbucket.org/hotpollen/flavonoid-rnaseq and Ann's fork of same)
Get (deploy) new tomato genome file from IGB Quickload - DONE
Explain how to run nextflow - in progress
Create GTF file for new genome models - DONE

Show

Ann Loraine added a comment - 20/Jul/22 11:07 AM - edited Update: SRR files are downloaded into /nobackup/tomato_genome/alt_splicing SRR files are in two directories SRP252265 (positive control) and SRP328042 (experiment pollen tube samples) To-do: Create and review samples file (used by nextflow to run the pipeline) - in progress Get tomato configuration file needed for nf-core/rnaseq with nextflow version 3.4 - DONE (see https://bitbucket.org/hotpollen/flavonoid-rnaseq and Ann's fork of same) Get (deploy) new tomato genome file from IGB Quickload - DONE Explain how to run nextflow - in progress Create GTF file for new genome models - DONE

Hide

Permalink

Ann Loraine added a comment - 20/Jul/22 8:14 PM - edited

Forked flavonoid-rnaseq into Ann's bitbucket account and cloned the fork onto cluster to: /nobackup/tomato_genome/scripts/flavonoid-rnaseq

Created a branch ~~IGBF-3143~~ for this alt splicing work.

Modified "doIt.sh" script to use star/salmon alignment option.

Added documentation in the comments.

Started long-lived interactive session and loaded nf-core with module load command.

Downloaded nf-core/rnaseq v. 3.4 to: /nobackup/tomato_genome/scripts/nf-core-rnaseq-3.4 and saved singularity images to: /nobackup/tomato_genome/scripts/nxf_singularity_cachedir

Made gene model GTF format file using bed2gtf.py from https://bitbucket.org/lorainelab/genomesource/src/master/, using option "-g" to indicate that the gene name is in field 13 of input bed-detail file from http://lorainelab-quickload.scidas.org/quickload/S_lycopersicum_Jun_2022/S_lycopersicum_Jun_2022.bed.gz, revision 164 of the file, from repository browsable at https://svn.bioviz.org/viewvc/genomes/quickload/S_lycopersicum_Jun_2022/.

Show

Ann Loraine added a comment - 20/Jul/22 8:14 PM - edited Forked flavonoid-rnaseq into Ann's bitbucket account and cloned the fork onto cluster to: /nobackup/tomato_genome/scripts/flavonoid-rnaseq Created a branch IGBF-3143 for this alt splicing work. Modified "doIt.sh" script to use star/salmon alignment option. Added documentation in the comments. Started long-lived interactive session and loaded nf-core with module load command. Downloaded nf-core/rnaseq v. 3.4 to: /nobackup/tomato_genome/scripts/nf-core-rnaseq-3.4 and saved singularity images to: /nobackup/tomato_genome/scripts/nxf_singularity_cachedir Made gene model GTF format file using bed2gtf.py from https://bitbucket.org/lorainelab/genomesource/src/master/ , using option "-g" to indicate that the gene name is in field 13 of input bed-detail file from http://lorainelab-quickload.scidas.org/quickload/S_lycopersicum_Jun_2022/S_lycopersicum_Jun_2022.bed.gz , revision 164 of the file, from repository browsable at https://svn.bioviz.org/viewvc/genomes/quickload/S_lycopersicum_Jun_2022/ .

Hide

Permalink

Ann Loraine added a comment - 25/Jul/22 1:09 PM

First run showed problem with sample sheet:

Error executing process > 'NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)'

Caused by:
Process `NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)` terminated with an error exit status (127)

Command executed:

check_samplesheet.py \
SRP328042.csv \
samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
SAMPLESHEET_CHECK:
python: $(python --version | sed 's/Python //g')
END_VERSIONS

Show

Ann Loraine added a comment - 25/Jul/22 1:09 PM First run showed problem with sample sheet: Error executing process > 'NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)' Caused by: Process `NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SRP328042.csv)` terminated with an error exit status (127) Command executed: check_samplesheet.py \ SRP328042.csv \ samplesheet.valid.csv cat <<-END_VERSIONS > versions.yml SAMPLESHEET_CHECK: python: $(python --version | sed 's/Python //g') END_VERSIONS

Hide

Permalink

Ann Loraine added a comment - 25/Jul/22 1:15 PM

Sample sheeted "strandedness" column contains an invalid value: "stranded." Documentation mentions only three values are allowed:

Must be one of unstranded, forward or reverse.

Also, it appears that our sample files need to be "gzipped":

File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".

[~molly] and [~RobertReid] - could you "gzip" the fastq files? Also, kindly modify the "strandedness" field to one of the three options listed.

Show

Ann Loraine added a comment - 25/Jul/22 1:15 PM Sample sheeted "strandedness" column contains an invalid value: "stranded." Documentation mentions only three values are allowed: Must be one of unstranded, forward or reverse. Also, it appears that our sample files need to be "gzipped": File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". [~molly] and [~RobertReid] - could you "gzip" the fastq files? Also, kindly modify the "strandedness" field to one of the three options listed.

Hide

Permalink

Ann Loraine added a comment - 25/Jul/22 1:25 PM

Looks like the positive control is indeed stranded, but I'm not sure whether we should "reverse" or "forward" option. I am looking into it....
attn: [~RobertReid] and [~molly]

Show

Ann Loraine added a comment - 25/Jul/22 1:25 PM Looks like the positive control is indeed stranded, but I'm not sure whether we should "reverse" or "forward" option. I am looking into it.... attn: [~RobertReid] and [~molly]

Hide

Permalink

Ann Loraine added a comment - 25/Jul/22 1:48 PM

I think we need to use "reverse" in the "strandedness" column because we actually end up sequencing the reverse (antisense) strand of the cDNA that gets made from the RNA. What happens with this protocol is that you start out by synthesizing cDNA from mRNA. The first strand synthesis step creates an anti-sense copy of the RNA, complementary to the original mRNA. That step uses regular nucleotides. The second strand synthesis step, which re-reproduces the original mRNA, using the first strand synthesis product as a template, uses dUTP in place of dTTP, which ultimately gets degraded after all the primer ligation steps happen. So, when you go to sequence the library, the only DNA you can sequence came from the first strand synthesis, which was the "reverse" strand of the original RNA. I think this is what the pipeline authors meant to intend by asking for "reverse" or "forward" in the sample sheet.

However, if this is wrong, the pipeline will let us know. It includes a step that looks at strandedness. So I think we can just go ahead and insert "reverse" in the sample sheet and see what happens.

attn: [~RobertReid] and [~molly]

Show

Ann Loraine added a comment - 25/Jul/22 1:48 PM I think we need to use "reverse" in the "strandedness" column because we actually end up sequencing the reverse (antisense) strand of the cDNA that gets made from the RNA. What happens with this protocol is that you start out by synthesizing cDNA from mRNA. The first strand synthesis step creates an anti-sense copy of the RNA, complementary to the original mRNA. That step uses regular nucleotides. The second strand synthesis step, which re-reproduces the original mRNA, using the first strand synthesis product as a template, uses dUTP in place of dTTP, which ultimately gets degraded after all the primer ligation steps happen. So, when you go to sequence the library, the only DNA you can sequence came from the first strand synthesis, which was the "reverse" strand of the original RNA. I think this is what the pipeline authors meant to intend by asking for "reverse" or "forward" in the sample sheet. However, if this is wrong, the pipeline will let us know. It includes a step that looks at strandedness. So I think we can just go ahead and insert "reverse" in the sample sheet and see what happens. attn: [~RobertReid] and [~molly]

Hide

Permalink

Ann Loraine added a comment - 26/Jul/22 11:33 AM - edited

Everything has been properly updated and I ran the pipeline again, making out.3.txt and err.3.txt.

However, a new error happened:

Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT'

Caused by:
Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT` terminated with an error exit status (127)

Command executed:

salmon_tximport.r \
NULL \
salmon \
salmon.merged

cat <<-END_VERSIONS > versions.yml
SALMON_TXIMPORT:
r-base: $(echo $(R --version 2>&1) | sed 's/^.R version //; s/ .$//')
bioconductor-tximeta: $(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))")
END_VERSIONS

Command exit status:
127

Command output:
(empty)

Command error:
INFO: Converting SIF file to temporary sandbox...
WARNING: Skipping mount /usr/local/singularity/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
.command.sh: line 3: salmon_tximport.r: command not found
INFO: Cleaning up image...

Work dir:
/nobackup/tomato_genome/alt_splicing/SRP328042/work/f1/9c6b783f8bafba25eb2aee017dfb65

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Show

Ann Loraine added a comment - 26/Jul/22 11:33 AM - edited Everything has been properly updated and I ran the pipeline again, making out.3.txt and err.3.txt. However, a new error happened: Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT' Caused by: Process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT` terminated with an error exit status (127) Command executed: salmon_tximport.r \ NULL \ salmon \ salmon.merged cat <<-END_VERSIONS > versions.yml SALMON_TXIMPORT: r-base: $(echo $(R --version 2>&1) | sed 's/^. R version //; s/ . $//') bioconductor-tximeta: $(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))") END_VERSIONS Command exit status: 127 Command output: (empty) Command error: INFO: Converting SIF file to temporary sandbox... WARNING: Skipping mount /usr/local/singularity/var/singularity/mnt/session/etc/resolv.conf [files] : /etc/resolv.conf doesn't exist in container .command.sh: line 3: salmon_tximport.r: command not found INFO: Cleaning up image... Work dir: /nobackup/tomato_genome/alt_splicing/SRP328042/work/f1/9c6b783f8bafba25eb2aee017dfb65 Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Hide

Permalink

Ann Loraine added a comment - 26/Jul/22 11:44 AM

I'm confused about why the above worked at all...I didn't start the nf-core virtual environment before attempting to re-run the pipeline. I am trying again with

./doIt.sh SRP328042.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.4.txt 2> err.4.txt

Show

Ann Loraine added a comment - 26/Jul/22 11:44 AM I'm confused about why the above worked at all...I didn't start the nf-core virtual environment before attempting to re-run the pipeline. I am trying again with ./doIt.sh SRP328042.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.4.txt 2> err.4.txt

Hide

Permalink

Ann Loraine added a comment - 27/Jul/22 2:58 PM - edited

Same error occurred. Ann, Rob, and Molly met via zoom to trouble-shoot. Rob suggested trying to run the pipeline using the older configuration and older data, since that worked. Rob & Molly proceeding with trouble-shooting as I am out of ideas!

Show

Ann Loraine added a comment - 27/Jul/22 2:58 PM - edited Same error occurred. Ann, Rob, and Molly met via zoom to trouble-shoot. Rob suggested trying to run the pipeline using the older configuration and older data, since that worked. Rob & Molly proceeding with trouble-shooting as I am out of ideas!

Hide

Permalink

Molly Davis added a comment - 12/Aug/22 12:55 PM - edited

Nextflow nf-core/rnaseq pipeline update:

Directory Location: /nobackup/tomato_genome/alt_splicing/SRP252265

Dr. Reid and I successfully ran the pipeline with SRP252265 for out.4.txt:

[nf-core/rnaseq] Pipeline completed successfully
Completed at: 02-Aug-2022 16:48:38
Duration : 1m 38s
CPU hours : 481.7 (100% cached)
Succeeded : 2
Cached : 828_

Solution: Added module load singularity to steps and removed the symbolic link for nf-core-rnaseq-3.4 and directly downloaded the files to the directory while not in an interactive session:

module load nf-core
nf-core download rnaseq -r 3.4

The only symbolic links present were for the file resources:

S_lycopersicum_Jun_2022.bed
S_lycopersicum_Jun_2022.fa
S_lycopersicum_Jun_2022.gtf
tomato.config
doIt.sh

Steps for successful run:

Add .bash_profile configurations
Start a tmux session (tmux new -s base)
Start an interactive session (srun --partition Andromeda --cpus-per-task 16 --mem-per-cpu 12000 --time 60:00:00 --pty bash)
Activate virtual environment (module load nf-core, module load singularity)
Start Nextflow (./doIt.sh SRP252265.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.0.txt 2> err.0.txt )
Check out file and results directory

Possible error to arise:
Error executing process > 'NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)'

Caused by:
Process `NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1)

Work dir:
/nobackup/tomato_genome/alt_splicing/SRP252265/work/1e/a6b39f6fb5d57e905cdc43045523cd

Line 52 in versions file:
TRIMGALORE:
trimgalore: perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory

Discussion: The following error is for a file that contains all of the versions of modules that were used during the pipeline process. The error line 52 in the versions file was for trimgalore. Trimgalore did successfully work with the use of the pipelines resources but on the cluster the module was nonexistent. So the process worked and the results were still there after it finished it was just informing us that the version type couldn't be found for the file to record.

We also created a script to use for easier use (still in the works to be seen as successful):

#!/bin/bash

#SBATCH --time=296:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --mem=401gb
#SBATCH --job-name=nnnnfcore-splice
#SBATCH --partition=Draco
#SBATCH --output=rrr-%x.%j.out
#SBATCH --error=rrr-%x.%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=mdavi258@uncc.edu

umask 007
set -eu

#file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/rnaseq-phase1/trinity/names.txt)
#dir=$(sed -n -e "${PBS_ARRAYID}p" /lustre/groups/lorainelab/data/illumina/sweet_potato/filtered/dir.txt)

echo "Launching Nextflow NF-Core"

#module load java8 
module load singularity
module load nf-core

##Always use slurm
export NXF_EXECUTOR=slurm
##Save Singularity containers in home dir
export NXF_SINGULARITY_CAHCEDIR="$HOME/nxf"
## Tell nextflow not to use internet
export NXF_OFFLINE='TRUE'
## Control Java Heap Size
export NXF_OPTS="-Xms2g -Xmx8g"

###Change to correct directory with fastq and symbolic link files
cd /nobackup/tomato_genome/alt_splicing/SRP328042-molly

#makeblastdb -in  /projects/tomato_genome/db/reference-4.0/ITAG4.0_cDNA.fasta -dbtype nucl

#nextflow run nf-core/rnaseq –profile ../../test,singularity

#/nobackup/tomato_genome/nfcore_rnaseq/tomato.config



###Change to correct names in directory
nextflow run nf-core-rnaseq-3.4/workflow \
         -resume \
         -profile singularity \
         -c tomato.config \
         --aligner star_salmon \
         --save_trimmed \
         --fasta S_lycopersicum_Jun_2022.fa \
         --input SRP328042.csv \
         --gtf S_lycopersicum_Jun_2022.gtf  \
         --gene_bed S_lycopersicum_Jun_2022.bed \
         --skip_biotype_qc \
         --skip_markduplicates \
         --skip_bigwig \
         --skip_stringtie \
         --skip_qualimap \
         --skip_fastqc


echo "Nextflow Pipeline Finished"

Troubleshooting:

Need to figure out if it would work better in Andromeda than Draco partition.
Using the script seems so have more random errors that do not make sense than using the srun which feels like a more organized and precise system to use.

Show

Molly Davis added a comment - 12/Aug/22 12:55 PM - edited Nextflow nf-core/rnaseq pipeline update: Directory Location: /nobackup/tomato_genome/alt_splicing/SRP252265 Dr. Reid and I successfully ran the pipeline with SRP252265 for out.4.txt : [nf-core/rnaseq] Pipeline completed successfully Completed at: 02-Aug-2022 16:48:38 Duration : 1m 38s CPU hours : 481.7 (100% cached) Succeeded : 2 Cached : 828_ Solution : Added module load singularity to steps and removed the symbolic link for nf-core-rnaseq-3.4 and directly downloaded the files to the directory while not in an interactive session: module load nf-core nf-core download rnaseq -r 3.4 The only symbolic links present were for the file resources: S_lycopersicum_Jun_2022.bed S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf tomato.config doIt.sh Steps for successful run: Add .bash_profile configurations Start a tmux session (tmux new -s base) Start an interactive session (srun --partition Andromeda --cpus-per-task 16 --mem-per-cpu 12000 --time 60:00:00 --pty bash) Activate virtual environment (module load nf-core, module load singularity) Start Nextflow (./doIt.sh SRP252265.csv S_lycopersicum_Jun_2022.fa S_lycopersicum_Jun_2022.gtf S_lycopersicum_Jun_2022.bed tomato.config 1> out.0.txt 2> err.0.txt ) Check out file and results directory Possible error to arise: Error executing process > 'NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)' Caused by: Process `NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1) Work dir: /nobackup/tomato_genome/alt_splicing/SRP252265/work/1e/a6b39f6fb5d57e905cdc43045523cd Line 52 in versions file: TRIMGALORE: trimgalore: perl: error while loading shared libraries: libperl.so: cannot open shared object file: No such file or directory Discussion: The following error is for a file that contains all of the versions of modules that were used during the pipeline process. The error line 52 in the versions file was for trimgalore. Trimgalore did successfully work with the use of the pipelines resources but on the cluster the module was nonexistent. So the process worked and the results were still there after it finished it was just informing us that the version type couldn't be found for the file to record. We also created a script to use for easier use (still in the works to be seen as successful): #!/bin/bash #SBATCH --time=296:30:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=24 #SBATCH --mem=401gb #SBATCH --job-name=nnnnfcore-splice #SBATCH --partition=Draco #SBATCH --output=rrr-%x.%j.out #SBATCH --error=rrr-%x.%j.err #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=mdavi258@uncc.edu umask 007 set -eu #file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/rnaseq-phase1/trinity/names.txt) #dir=$(sed -n -e "${PBS_ARRAYID}p" /lustre/groups/lorainelab/data/illumina/sweet_potato/filtered/dir.txt) echo "Launching Nextflow NF-Core" #module load java8 module load singularity module load nf-core ##Always use slurm export NXF_EXECUTOR=slurm ##Save Singularity containers in home dir export NXF_SINGULARITY_CAHCEDIR= "$HOME/nxf" ## Tell nextflow not to use internet export NXF_OFFLINE='TRUE' ## Control Java Heap Size export NXF_OPTS= "-Xms2g -Xmx8g" ###Change to correct directory with fastq and symbolic link files cd /nobackup/tomato_genome/alt_splicing/SRP328042-molly #makeblastdb -in /projects/tomato_genome/db/reference-4.0/ITAG4.0_cDNA.fasta -dbtype nucl #nextflow run nf-core/rnaseq –profile ../../test,singularity #/nobackup/tomato_genome/nfcore_rnaseq/tomato.config ###Change to correct names in directory nextflow run nf-core-rnaseq-3.4/workflow \ -resume \ -profile singularity \ -c tomato.config \ --aligner star_salmon \ --save_trimmed \ --fasta S_lycopersicum_Jun_2022.fa \ --input SRP328042.csv \ --gtf S_lycopersicum_Jun_2022.gtf \ --gene_bed S_lycopersicum_Jun_2022.bed \ --skip_biotype_qc \ --skip_markduplicates \ --skip_bigwig \ --skip_stringtie \ --skip_qualimap \ --skip_fastqc echo "Nextflow Pipeline Finished" Troubleshooting: Need to figure out if it would work better in Andromeda than Draco partition. Using the script seems so have more random errors that do not make sense than using the srun which feels like a more organized and precise system to use.

Hide

Permalink

Molly Davis added a comment - 15/Aug/22 11:22 AM

Nextflow nf-core/rnaseq pipeline update:

SRP328042 successfully ran through the nf-core pipeline.
Working Directory:
/nobackup/tomato_genome/alt_splicing/SRP328042-molly

Troubleshooting: I skipped custom_dumpsoftwareversions in the doIt.sh script:
nextflow run nf-core-rnaseq-3.4/workflow \
-resume \
-profile singularity \
-c $CONFIG \
--aligner star_salmon \
--save_trimmed \
--fasta $GENOMEFASTA \
--input $SAMPLESHEET \
--gtf $GTF \
--gene_bed $GENEBED \
--skip_biotype_qc \
--skip_markduplicates \
--skip_bigwig \
--skip_stringtie \
--skip_qualimap \
--skip_fastqc \
--skip_custom_dumpsoftwareversions

Show

Molly Davis added a comment - 15/Aug/22 11:22 AM Nextflow nf-core/rnaseq pipeline update: SRP328042 successfully ran through the nf-core pipeline. Working Directory: /nobackup/tomato_genome/alt_splicing/SRP328042-molly Troubleshooting: I skipped custom_dumpsoftwareversions in the doIt.sh script: nextflow run nf-core-rnaseq-3.4/workflow \ -resume \ -profile singularity \ -c $CONFIG \ --aligner star_salmon \ --save_trimmed \ --fasta $GENOMEFASTA \ --input $SAMPLESHEET \ --gtf $GTF \ --gene_bed $GENEBED \ --skip_biotype_qc \ --skip_markduplicates \ --skip_bigwig \ --skip_stringtie \ --skip_qualimap \ --skip_fastqc \ --skip_custom_dumpsoftwareversions

Hide

Permalink

Ann Loraine added a comment - 26/Aug/22 12:51 PM

During our meeting today, we looked over the files. Everything looks great, so moving this ticket to Done.

Summary:

Data processing output files are on the HPC cluster here:

/nobackup/tomato_genome/alt_splicing/SRP328042-molly
/nobackup/tomato_genome/alt_splicing/SRP252265

Show

Ann Loraine added a comment - 26/Aug/22 12:51 PM During our meeting today, we looked over the files. Everything looks great, so moving this ticket to Done. Summary: Data processing output files are on the HPC cluster here: /nobackup/tomato_genome/alt_splicing/SRP328042-molly /nobackup/tomato_genome/alt_splicing/SRP252265

Hide

Permalink

Ann Loraine added a comment - 17/Nov/22 12:46 PM

Planning to align data from two new possible possible controls: SRP100604 and SRP268884

Show

Ann Loraine added a comment - 17/Nov/22 12:46 PM Planning to align data from two new possible possible controls: SRP100604 and SRP268884

People

Assignee:

Molly Davis

Reporter:

Ann Loraine

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

07/Jul/22 12:16 PM

Updated:

17/Nov/22 12:49 PM

Resolved:

26/Aug/22 12:54 PM