[IGBF-3040] Create samplesheet with SRR run identifiers and experimental attributes - JIRA UNCC

Ann Loraine created issue - 22/Dec/21 3:41 PM

Ann Loraine made changes - 22/Dec/21 3:41 PM

Field	Original Value	New Value
Epic Link		IGBF-3039 [ 21553 ]

Ann Loraine made changes - 22/Dec/21 3:41 PM

Rank

Ranked higher

Ann Loraine made changes - 22/Dec/21 7:48 PM

Attachment

sra_explorer_metadata.tsv [ 17043 ]

Ann Loraine made changes - 22/Dec/21 8:09 PM

Description

Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline.

Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

The columns should include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended
# fastq_2 - SRR name with _2 appended, or blank for single-end samples
# strandedness (need to look up how Nextflow designates this)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses.

Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

The columns should include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended
# fastq_2 - SRR name with _2 appended, or blank for single-end samples
# strandedness (need to look up how Nextflow designates this)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Ann Loraine made changes - 23/Dec/21 5:23 AM

Description

Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses.

Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

The columns should include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended
# fastq_2 - SRR name with _2 appended, or blank for single-end samples
# strandedness (need to look up how Nextflow designates this)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

The first steps in doing this will be to:

* generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
* generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])

The sample sheet data file columns will include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
# fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
# strandedness (need to look up how Nextflow designates this)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Ann Loraine made changes - 23/Dec/21 5:23 AM

Comment

[ The Bioconductor and probably other libraries as well in R can probably assemble this information automatically. ]

Ann Loraine made changes - 23/Dec/21 6:40 AM

Attachment

strand.png [ 17044 ]

Ann Loraine made changes - 23/Dec/21 6:41 AM

Description

Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

The first steps in doing this will be to:

* generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
* generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])

The sample sheet data file columns will include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
# fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
# strandedness (need to look up how Nextflow designates this)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

The first steps in doing this will be to:

* generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
* generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])

The sample sheet data file columns will include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
# fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
# strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Ann Loraine made changes - 25/Dec/21 11:53 AM

Sprint

Spring 1 2022 Jan 3 - Jan 14 [ 136 ]

Ann Loraine made changes - 25/Dec/21 11:53 AM

Rank

Ranked lower

Nowlan Freese made changes - 04/Jan/22 9:46 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Nowlan Freese made changes - 04/Jan/22 9:47 AM

Assignee

Nowlan Freese [ nfreese ]

Nowlan Freese made changes - 05/Jan/22 9:07 AM

Description

Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

The first steps in doing this will be to:

* generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
* generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])

The sample sheet data file columns will include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
# fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
# strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

The first steps in doing this will be to:

* generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
* generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])

The sample sheet data file columns will include the following fields:

# SRA run identifier (e.g, SRR7591232)
# fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
# fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
# strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
# genotype - A (Agami), M (M103)
# treatment - C (control), E (salt)
# 5-Azacytidine treatment - Y (treated), N (not treated)
# tissue - S (shoot), R (root)
# replicate - 1, 2, 3
# read length

Nowlan Freese made changes - 05/Jan/22 3:10 PM

Attachment

samplesheet_RNA-Seq.csv [ 17050 ]

Nowlan Freese made changes - 05/Jan/22 3:54 PM

Assignee

Nowlan Freese [ nfreese ]

Ann Loraine [ aloraine ]

Nowlan Freese made changes - 05/Jan/22 3:55 PM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Ann Loraine made changes - 06/Jan/22 10:06 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Ann Loraine made changes - 06/Jan/22 10:06 AM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Ann Loraine made changes - 06/Jan/22 10:21 AM

Link

This issue relates to ~~IGBF-3042~~ [ ~~IGBF-3042~~ ]

Ann Loraine made changes - 17/Jan/22 11:09 PM

Sprint

Spring 1 2022 Jan 3 - Jan 14 [ 136 ]

Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28 [ 136, 137 ]

Ann Loraine made changes - 17/Jan/22 11:09 PM

Rank

Ranked higher