Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3040

Create samplesheet with SRR run identifiers and experimental attributes

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Previous analysis of the RNA-Seq data found that the treatments triggered changes in expression and splicing. Since our first processing of the data, we added a new dataset in which a methylation inhibitor was applied, but this dataset has not been processed as yet.

      A new RNA-Seq data analysis pipeline has been developed that uses the Nextflow workflow system, and we've been using this workflow system to process data from the pollen heat stress project. Also, this workflow management system is better equipped to accommodate diverse samples, e.g., sample libraries sequenced using different strategies (single- versus paired-end) and read lengths.

      Since we need to process a new methylation-inhibitor RNA-Seq dataset and incorporate it into our analysis, let's reprocess the data using a more up-to-date workflow - the nc-core/rnaseq pipeline.

      The first steps in doing this will be to:

      • generate a comma-separated sample sheet data file that relates SRA run identifiers to experimental attributes, required for running nextflow. Note that we can also use the sample sheet as inputs for statistical analyses.
      • generate a script that will download the SRA data files and convert them to fastq, required for running the pipeline. (Note: The project identifier is: PRJNA481973/)

      The sample sheet data file columns will include the following fields:

      1. SRA run identifier (e.g, SRR7591232)
      2. fastq_1 - SRR name with _1 appended (e.g, SRR7591232_1)
      3. fastq_2 - SRR name with _2 appended, or blank for single-end samples ( (e.g, SRR7591232_2)
      4. strandedness - should be "reverse" for Truseq Illumina protocol (see attached image from nf-core/rnaseq slack)
      5. genotype - A (Agami), M (M103)
      6. treatment - C (control), E (salt)
      7. 5-Azacytidine treatment - Y (treated), N (not treated)
      8. tissue - S (shoot), R (root)
      9. replicate - 1, 2, 3
      10. read length

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                nfreese Nowlan Freese
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: