Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Sprint:Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28
Description
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.
Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.
The first steps in doing this will be to:
- generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
- generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: PRJNA481973/)
The sample sheet data file columns will include the following fields:
- SRA run identifier (e.g, SRR7591232)
- fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
- fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
- strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
- genotype - A (Agami), M (M103)
- treatment - C (control), E (salt)
- 5-Azacytidine treatment - Y (treated), N (not treated)
- tissue - S (shoot), R (root)
- replicate - 1, 2, 3
- read length
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Epic Link | IGBF-3039 [ 21553 ] |
Rank | Ranked higher |
Attachment | sra_explorer_metadata.tsv [ 17043 ] |
Description |
Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/] The columns should include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended # fastq_2 - SRR name with _2 appended, or blank for single-end samples # strandedness (need to look up how Nextflow designates this) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses. Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/] The columns should include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended # fastq_2 - SRR name with _2 appended, or blank for single-end samples # strandedness (need to look up how Nextflow designates this) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Description |
Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses. Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/] The columns should include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended # fastq_2 - SRR name with _2 appended, or blank for single-end samples # strandedness (need to look up how Nextflow designates this) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths. Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline. The first steps in doing this will be to: * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses. * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]) The sample sheet data file columns will include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1) # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2) # strandedness (need to look up how Nextflow designates this) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Comment | [ The Bioconductor and probably other libraries as well in R can probably assemble this information automatically. ] |
Attachment | strand.png [ 17044 ] |
Description |
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths. Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline. The first steps in doing this will be to: * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses. * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]) The sample sheet data file columns will include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1) # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2) # strandedness (need to look up how Nextflow designates this) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths. Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline. The first steps in doing this will be to: * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses. * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]) The sample sheet data file columns will include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1) # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2) # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Sprint | Spring 1 2022 Jan 3 - Jan 14 [ 136 ] |
Rank | Ranked lower |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Assignee | Nowlan Freese [ nfreese ] |
Description |
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths. Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline. The first steps in doing this will be to: * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses. * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]) The sample sheet data file columns will include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1) # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2) # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.
A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths. Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline. The first steps in doing this will be to: * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses. * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]) The sample sheet data file columns will include the following fields: # SRA run identifier (e.g, SRR7591232) # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1) # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2) # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack) # genotype - A (Agami), M (M103) # treatment - C (control), E (salt) # 5-Azacytidine treatment - Y (treated), N (not treated) # tissue - S (shoot), R (root) # replicate - 1, 2, 3 # read length |
Attachment | samplesheet_RNA-Seq.csv [ 17050 ] |
Assignee | Nowlan Freese [ nfreese ] | Ann Loraine [ aloraine ] |
Status | In Progress [ 3 ] | To-Do [ 10305 ] |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Sprint | Spring 1 2022 Jan 3 - Jan 14 [ 136 ] | Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28 [ 136, 137 ] |
Rank | Ranked higher |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
Resolution | Done [ 10000 ] | |
Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
Assignee | Ann Loraine [ aloraine ] | Nowlan Freese [ nfreese ] |
Used https://sra-explorer.info/ (SRA Explorer) to get tsv file (attached) listing SRR numbers for RNA-Seq data files associated with the experiment.
Next, we can use fasterq-dump (installed on the Charlotte HPC system) to download the files from the SRA and simultaneously convert them to fastq format.
See: this comment from
IGBF-2984"Investigate tools for detecting strandedness in RNA-Seq" for the correct options to use.