Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3373

Provide Details of Analysis for Muday lab

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Anthony Postiglione request:

      "Hi Molly,

      Thank you so much for your help with getting all of the RNA-Seq data visualized for us! I have just one more favor to ask of you when it comes to my thesis chapter, I need a brief explanation of how the RNA-Seq analysis was handled on your end. I am out of my depth here for sure, so would you mind adding some brief detail to the section I am pasting below? Thank you so much again!

      Analysis of RNA Seq samples

      PCA plots were generated by XXX, EdgeR was used to identify DE genes, Volcano plots were generated to show temperature effects within genotypes, and lists of DE genes were compared across genotypes..."

      Email back a response or post in thesis document:
      https://docs.google.com/document/d/1iJd3wUgBSY8lXlIdPls6_4TU0T2h5Ecb5twQkB7dFg0/edit

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Response: The RNA sequencing data was processed through the nf-core/rnaseq bioinformatics pipeline. A STAR alignment was conducted with the SL5 tomato genome. Output included a Salmon gene expression counts file. Each column contained a different experiment that was conducted whether at a different time, temperature, or genotype. Time durations being 15 minutes, 30 minutes, 45 minutes, or 75 minutes. Temperature being 28 degrees celsius or 34 degrees celsius. Genotypes being VF36, OE3, or ARE. The values in the matrix should be un-normalized counts or estimated counts of sequencing reads when used with DESeq. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input. DESeq was used to identify differentially expressed (DE) genes. The design included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced with the use of DESeq output also but instead only had a 'temperature' design. This is due to wanting to only view differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package.

            Code Resources and Packages:

            Data and Markdown location: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/

            Counts file: muday-144-SL5_counts-salmon.txt
            PCA plot Markdown: Muday-DESeq-PCA-Plots.Rmd
            Volcano Plot Markdown: FindControlVsStressDEGenes-DESeq.Rmd

            Show
            Mdavis4290 Molly Davis added a comment - - edited Response : The RNA sequencing data was processed through the nf-core/rnaseq bioinformatics pipeline. A STAR alignment was conducted with the SL5 tomato genome. Output included a Salmon gene expression counts file. Each column contained a different experiment that was conducted whether at a different time, temperature, or genotype. Time durations being 15 minutes, 30 minutes, 45 minutes, or 75 minutes. Temperature being 28 degrees celsius or 34 degrees celsius. Genotypes being VF36, OE3, or ARE. The values in the matrix should be un-normalized counts or estimated counts of sequencing reads when used with DESeq. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input. DESeq was used to identify differentially expressed (DE) genes. The design included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced with the use of DESeq output also but instead only had a 'temperature' design. This is due to wanting to only view differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package. Code Resources and Packages : DESeq code resource: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs nf-core resource: https://nf-co.re/rnaseq plotPCA() : https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/plotPCA ggplot() : https://ggplot2.tidyverse.org EnhancedVolcano() : https://www.rdocumentation.org/packages/EnhancedVolcano/versions/1.11.3/topics/EnhancedVolcano Data and Markdown location : https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ Counts file: muday-144-SL5_counts-salmon.txt PCA plot Markdown: Muday-DESeq-PCA-Plots.Rmd Volcano Plot Markdown: FindControlVsStressDEGenes-DESeq.Rmd
            Hide
            ann.loraine Ann Loraine added a comment -

            [~aloraine]: asked about using scaled counts as inputs to the PCA instead of the non-scaled "raw" counts. [~molly] and [~RobertReid] noted that the DESeq library functions being used to make the PCA plots performs a scaling / normalization prior to the PCA step.

            Show
            ann.loraine Ann Loraine added a comment - [~aloraine] : asked about using scaled counts as inputs to the PCA instead of the non-scaled "raw" counts. [~molly] and [~RobertReid] noted that the DESeq library functions being used to make the PCA plots performs a scaling / normalization prior to the PCA step.
            Hide
            robofjoy Robert Reid added a comment -

            Let's add in a few refs where we can!

            For NFCORE, the ref is:
            https://www.nature.com/articles/s41587-020-0439-x

            For Star alignment, ref is:
            https://academic.oup.com/bioinformatics/article/29/1/15/272537
            Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics 29.1 (2013): 15-21.

            Anthony can choose to incorporate them in however he so desires.

            Show
            robofjoy Robert Reid added a comment - Let's add in a few refs where we can! For NFCORE, the ref is: https://www.nature.com/articles/s41587-020-0439-x For Star alignment, ref is: https://academic.oup.com/bioinformatics/article/29/1/15/272537 Dobin, Alexander, et al. "STAR: ultrafast universal RNA-seq aligner." Bioinformatics 29.1 (2013): 15-21. Anthony can choose to incorporate them in however he so desires.
            Hide
            robofjoy Robert Reid added a comment - - edited

            I made some edits:

            The RNA sequencing data was processed through the nf-core pipeline (nf-core ref) where all sequences were aligned to the most recent tomato genome (SL5 ref) using STAR (star ref). This produced a Salmon gene expression counts file with each column representing a different experimental sample consisting of different time, temperature, genotype and replicate. The time course consisted of 15, 30, 45 or 75 minutes. Temperature was controlled at 28 or 34 degrees Celsius. Genotypes sequenced were VF36, OE3, or ARE. The un-normalized counts file was processed with DESeq2 (deseq2 ref). The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size are not necessary as input. DESeq2 identified differentially expressed (DE) genes using an experimental design that included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced using the DESeq2 output based on 'temperature' in order to explore the differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package.

            SL5 genome reference :
            https://www.nature.com/articles/s41586-022-04808-9

            deseq2 ref:
            Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 1-21.

            Show
            robofjoy Robert Reid added a comment - - edited I made some edits: The RNA sequencing data was processed through the nf-core pipeline (nf-core ref) where all sequences were aligned to the most recent tomato genome (SL5 ref) using STAR (star ref). This produced a Salmon gene expression counts file with each column representing a different experimental sample consisting of different time, temperature, genotype and replicate. The time course consisted of 15, 30, 45 or 75 minutes. Temperature was controlled at 28 or 34 degrees Celsius. Genotypes sequenced were VF36, OE3, or ARE. The un-normalized counts file was processed with DESeq2 (deseq2 ref). The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size are not necessary as input. DESeq2 identified differentially expressed (DE) genes using an experimental design that included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced using the DESeq2 output based on 'temperature' in order to explore the differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package. SL5 genome reference : https://www.nature.com/articles/s41586-022-04808-9 deseq2 ref: Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 1-21.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Perfect thank you! [~RobertReid]

            Show
            Mdavis4290 Molly Davis added a comment - Perfect thank you! [~RobertReid]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Final response:

            The RNA sequencing data was processed through the nf-core pipeline (nf-core ref) where all sequences were aligned to the most recent tomato genome (SL5 ref) using STAR (star ref). This produced a Salmon gene expression counts file with each column representing a different experimental sample consisting of different time, temperature, genotype and replicate. The time course consisted of 15, 30, 45 or 75 minutes. Temperature was controlled at 28 or 34 degrees Celsius. Genotypes sequenced were VF36, OE3, or ARE. The un-normalized counts file was processed with DESeq2 (deseq2 ref). The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size are not necessary as input. DESeq2 identified differentially expressed (DE) genes using an experimental design that included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced using the DESeq2 output based on 'temperature' in order to explore the differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package.

            nf-core References:

            SL5 Reference:

            STAR Reference:

            DESeq2 References:

            plotPCA() Reference:

            ggplot() Reference:

            EnhancedVolcano() Reference:

            Data and Markdown File names: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/
            Counts file: muday-144-SL5_counts-salmon.txt
            PCA plot Markdown: Muday-DESeq-PCA-Plots.Rmd
            Volcano Plot Markdown: FindControlVsStressDEGenes-DESeq.Rmd

            Show
            Mdavis4290 Molly Davis added a comment - - edited Final response : The RNA sequencing data was processed through the nf-core pipeline (nf-core ref) where all sequences were aligned to the most recent tomato genome (SL5 ref) using STAR (star ref). This produced a Salmon gene expression counts file with each column representing a different experimental sample consisting of different time, temperature, genotype and replicate. The time course consisted of 15, 30, 45 or 75 minutes. Temperature was controlled at 28 or 34 degrees Celsius. Genotypes sequenced were VF36, OE3, or ARE. The un-normalized counts file was processed with DESeq2 (deseq2 ref). The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size are not necessary as input. DESeq2 identified differentially expressed (DE) genes using an experimental design that included the factors 'time' and 'temperature' for the PCA plots. The PCA plots were created with the use of RStudio and plotPCA() and ggplot() packages. Volcano plots were produced using the DESeq2 output based on 'temperature' in order to explore the differentially expressed genes that are impacted by heat stress. The volcano plots were created with the use of RStudio and EnhancedVolcano() package. nf-core References: https://www.nature.com/articles/s41587-020-0439-x https://nf-co.re/rnaseq SL5 Reference: https://www.nature.com/articles/s41586-022-04808-9 STAR Reference: https://academic.oup.com/bioinformatics/article/29/1/15/272537 DESeq2 References: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 1-21. plotPCA() Reference: https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/plotPCA ggplot() Reference: https://ggplot2.tidyverse.org EnhancedVolcano() Reference: https://www.rdocumentation.org/packages/EnhancedVolcano/versions/1.11.3/topics/EnhancedVolcano Data and Markdown File names : https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ Counts file: muday-144-SL5_counts-salmon.txt PCA plot Markdown: Muday-DESeq-PCA-Plots.Rmd Volcano Plot Markdown: FindControlVsStressDEGenes-DESeq.Rmd

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: