Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3040

Create samplesheet with SRR run identifiers and experimental attributes

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

      A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

      Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

      The first steps in doing this will be to:

      • generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
      • generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: PRJNA481973/)

      The sample sheet data file columns will include the following fields:

      1. SRA run identifier (e.g, SRR7591232)
      2. fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
      3. fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
      4. strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
      5. genotype - A (Agami), M (M103)
      6. treatment - C (control), E (salt)
      7. 5-Azacytidine treatment - Y (treated), N (not treated)
      8. tissue - S (shoot), R (root)
      9. replicate - 1, 2, 3
      10. read length

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-3039 [ 21553 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Attachment sra_explorer_metadata.tsv [ 17043 ]
            ann.loraine Ann Loraine made changes -
            Description Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
            Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline.

            Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

            The columns should include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples
            # strandedness (need to look up how Nextflow designates this)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
            Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses.

            Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

            The columns should include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples
            # strandedness (need to look up how Nextflow designates this)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            ann.loraine Ann Loraine made changes -
            Description Create a comma-separated data file that relates SRA run identifiers to experimental attributes.
            Ensure that the sample sheet can be used to run the Nextflow rna-seq pipeline. Note that we can also use the sample sheet as inputs for statistical analyses.

            Project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/]

            The columns should include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples
            # strandedness (need to look up how Nextflow designates this)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

            A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

            Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

            The first steps in doing this will be to:

            * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
            * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])


            The sample sheet data file columns will include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
            # strandedness (need to look up how Nextflow designates this)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            ann.loraine Ann Loraine made changes -
            Comment [ The Bioconductor and probably other libraries as well in R can probably assemble this information automatically. ]
            ann.loraine Ann Loraine made changes -
            Attachment strand.png [ 17044 ]
            ann.loraine Ann Loraine made changes -
            Description Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

            A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

            Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

            The first steps in doing this will be to:

            * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
            * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])


            The sample sheet data file columns will include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
            # strandedness (need to look up how Nextflow designates this)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

            A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

            Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

            The first steps in doing this will be to:

            * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
            * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])


            The sample sheet data file columns will include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
            # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1 2022 Jan 3 - Jan 14 [ 136 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked lower
            nfreese Nowlan Freese made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Description Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

            A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

            Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

            The first steps in doing this will be to:

            * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
            * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])


            The sample sheet data file columns will include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
            # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-aza-2’-deoxycytidine (5-azaC) treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

            A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

            Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

            The first steps in doing this will be to:

            * generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
            * generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: [PRJNA481973/|https://www.ncbi.nlm.nih.gov/bioproject/PRJNA481973/])


            The sample sheet data file columns will include the following fields:

            # SRA run identifier (e.g, SRR7591232)
            # fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
            # fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
            # strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
            # genotype - A (Agami), M (M103)
            # treatment - C (control), E (salt)
            # 5-Azacytidine treatment - Y (treated), N (not treated)
            # tissue - S (shoot), R (root)
            # replicate - 1, 2, 3
            # read length
            nfreese Nowlan Freese made changes -
            Attachment samplesheet_RNA-Seq.csv [ 17050 ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Ann Loraine [ aloraine ]
            nfreese Nowlan Freese made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3042 [ IGBF-3042 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 1 2022 Jan 3 - Jan 14 [ 136 ] Spring 1 2022 Jan 3 - Jan 14, Spring 2 2022 Jan 18 - Jan 28 [ 136, 137 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Nowlan Freese [ nfreese ]
            ann.loraine Ann Loraine made changes -
            Link This issue blocks IGBF-3144 [ IGBF-3144 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3238 [ IGBF-3238 ]
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3238 [ IGBF-3238 ]

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: