Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3436

Compare temperature responsiveness between genotypes using DESeq2

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      The time course data set includes genotypes are, VF36, F3H

      For this task, create some code that tests the interaction between temperature and genotype.

      Note: We will have results for every time point individually.

      Goal: Find genes that respond differently to heat stress depending on genotype.

      For example, is there a gene that increases expression in response to heat in the "are" genotype and decreases expression in response to heat in the "VF36" genotype?

      Suggestions for how to do these things:

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Designs to try:
            ~genotype + temperature + genotype:temperature
            ~time + genotype + temperature + genotype:temperature

            Interaction term " : " answers the question is the treatment effect different across genotypes for each gene.

            Next step: Write a script and model.matrix( ~time + genotype + temperature + genotype:temperature) to observe the interaction term and determine if design is accurate. Do I need to add time in my design? We will be taking each time point individually and running through analysis so time cannot be included in the design.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Designs to try : ~genotype + temperature + genotype:temperature ~time + genotype + temperature + genotype:temperature Interaction term " : " answers the question is the treatment effect different across genotypes for each gene. Next step: Write a script and model.matrix( ~time + genotype + temperature + genotype:temperature) to observe the interaction term and determine if design is accurate. Do I need to add time in my design? We will be taking each time point individually and running through analysis so time cannot be included in the design.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Branch: https://bitbucket.org/mdavis4290/molly3-flavonoid-rnaseq/branch/IGBF-3436

            Script name: Muday-144_genotype.R

            Notes:

            • Make sure to open 72_F3H_PollenTube.Rproj in Rstudio first so count file is referenced correctly
            • Review design in section # View DESeq2 Design line by line
            • Review times individually in section # Run DESeq function with each time point
            Show
            Mdavis4290 Molly Davis added a comment - - edited Branch : https://bitbucket.org/mdavis4290/molly3-flavonoid-rnaseq/branch/IGBF-3436 Script name : Muday-144_genotype.R Notes : Make sure to open 72_F3H_PollenTube.Rproj in Rstudio first so count file is referenced correctly Review design in section # View DESeq2 Design line by line Review times individually in section # Run DESeq function with each time point
            Show
            Mdavis4290 Molly Davis added a comment - Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/25
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged. The code is easy to understand. It's pretty obvious at most steps what is happening and intended.

            I did not run it yet, but I don't think this is necessary as the design is clear. Moving to Done.

            Show
            ann.loraine Ann Loraine added a comment - PR is merged. The code is easy to understand. It's pretty obvious at most steps what is happening and intended. I did not run it yet, but I don't think this is necessary as the design is clear. Moving to Done.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Design Notes

            Question to answer: Is the treatment effect different across genotypes?

            Understanding results of the interaction term: https://rdrr.io/bioc/DESeq2/man/results.html#:~:text=object)%24tDegreesFreedom%20.-,Value,standard%20error%20of%20the%20log2FoldChange%20.

            • I discovered that the results term will just return the comparison of the last level of the last variable in the design formula over the first level of this variable.
            • If you want to see results for a specific comparison you are supposed to use a contrast argument. In other words, the argument contrast can be used to generate results tables for any comparison of interest. It subtracts to see the difference between the two genotypes.
              Ex: results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB"))
            • resultsNames(): shows the names of the columns available as results, usually a combination of the variable name and a level

            Examples:

            genotypeII.conditionB = gives the condition effect for genotype II vs I
            genotypeIII.conditionB = gives the condition effect for genotype III vs I

            Examples in Code:

            ## Example: two conditions, three genotypes
            
            # ~~~ Using interaction terms ~~~
            
            dds <- makeExampleDESeqDataSet(n=100,m=18)
            dds$genotype <- factor(rep(rep(c("I","II","III"),each=3),2))
            design(dds) <- ~ genotype + condition + genotype:condition
            dds <- DESeq(dds)
            resultsNames(dds)
            
            # the condition effect for genotype I (the main effect)
            results(dds, contrast=c("condition","B","A"))
            
            # the condition effect for genotype III.
            # this is the main effect *plus* the interaction term
            # (the extra condition effect in genotype III compared to genotype I).
            results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") ))
             
            # the interaction term for condition effect in genotype III vs genotype I.
            # this tests if the condition effect is different in III compared to I
            results(dds, name="genotypeIII.conditionB")
            
            # the interaction term for condition effect in genotype III vs genotype II.
            # this tests if the condition effect is different in III compared to II
            results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB"))
            
            ## where the first element of the list is the numerator (so gets treated positively) and the second is the denominator (so gets subtracted), which leaves the effect you're looking for.
            
            # Note that a likelihood ratio could be used to test if there are any
            # differences in the condition effect between the three genotypes.
            
            • Output for each contrast would include results columns: baseMean, log2FoldChange, lfcSE, stat, pvalue and padj

            Muday Design:

            ## Run analysis
            dds <- DESeqDataSetFromMatrix(countData = round(cts),
                                            colData = coldata,
                                            design = ~genotype + temperature + genotype:temperature)
            featureData <- data.frame(gene=rownames(cts))
            mcols(dds) <- DataFrame(mcols(dds), featureData)
            dds <- DESeq(dds, minReplicatesForReplace=Inf)
            keep <- rowSums(counts(dds)) >= 10
            dds <- dds[keep,]
            resultsNames(dds) # View all of the Design and interactions
            
            
            # this tests if the condition effect is different in genotype III compared to I
            res_V_vs_A <- results(dds, name="genotypeV.temperature34")
            res_V_vs_A <- res_V_vs_A[order(res_V_vs_A $pvalue),] # Order by pvalue
            res_V_vs_A 
            
            # this tests if the condition effect is different in genotype III compared to II
            res_V_vs_F <- results(dds, contrast=list("genotypeV.temperature34", "genotypeF.temperature34"))
            res_V_vs_F <- res_V_vs_F[order(res_V_vs_F$pvalue),] # Order by pvalue
            res_V_vs_F
            

            Questions to answer:

            • Why not just use the interaction term in the design?
              The interaction term will not have the same meaning as it would if both main effects were included in the model. It actually allows us to make contrasts down the line.
            • Why use an interaction term instead of grouping the factors?
            • Are we actually seeing a difference in gene expression across genotypes based on the condition?
            • What does log2FoldChange mean and represent within the design?
              This value indicates how much the gene or transcript's expression seems to have changed between the comparison and control groups.
            Show
            Mdavis4290 Molly Davis added a comment - - edited Design Notes Question to answer : Is the treatment effect different across genotypes? Understanding results of the interaction term : https://rdrr.io/bioc/DESeq2/man/results.html#:~:text=object)%24tDegreesFreedom%20.-,Value,standard%20error%20of%20the%20log2FoldChange%20 . I discovered that the results term will just return the comparison of the last level of the last variable in the design formula over the first level of this variable. If you want to see results for a specific comparison you are supposed to use a contrast argument. In other words, the argument contrast can be used to generate results tables for any comparison of interest. It subtracts to see the difference between the two genotypes. Ex: results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB")) resultsNames(): shows the names of the columns available as results, usually a combination of the variable name and a level Examples: genotypeII.conditionB = gives the condition effect for genotype II vs I genotypeIII.conditionB = gives the condition effect for genotype III vs I Examples in Code: ## Example: two conditions, three genotypes # ~~~ Using interaction terms ~~~ dds <- makeExampleDESeqDataSet(n=100,m=18) dds$genotype <- factor(rep(rep(c( "I" , "II" , "III" ),each=3),2)) design(dds) <- ~ genotype + condition + genotype:condition dds <- DESeq(dds) resultsNames(dds) # the condition effect for genotype I (the main effect) results(dds, contrast=c( "condition" , "B" , "A" )) # the condition effect for genotype III. # this is the main effect *plus* the interaction term # (the extra condition effect in genotype III compared to genotype I). results(dds, contrast=list( c( "condition_B_vs_A" , "genotypeIII.conditionB" ) )) # the interaction term for condition effect in genotype III vs genotype I. # this tests if the condition effect is different in III compared to I results(dds, name= "genotypeIII.conditionB" ) # the interaction term for condition effect in genotype III vs genotype II. # this tests if the condition effect is different in III compared to II results(dds, contrast=list( "genotypeIII.conditionB" , "genotypeII.conditionB" )) ## where the first element of the list is the numerator (so gets treated positively) and the second is the denominator (so gets subtracted), which leaves the effect you're looking for . # Note that a likelihood ratio could be used to test if there are any # differences in the condition effect between the three genotypes. Output for each contrast would include results columns: baseMean, log2FoldChange, lfcSE, stat, pvalue and padj Muday Design : ## Run analysis dds <- DESeqDataSetFromMatrix(countData = round(cts), colData = coldata, design = ~genotype + temperature + genotype:temperature) featureData <- data.frame(gene=rownames(cts)) mcols(dds) <- DataFrame(mcols(dds), featureData) dds <- DESeq(dds, minReplicatesForReplace=Inf) keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] resultsNames(dds) # View all of the Design and interactions # this tests if the condition effect is different in genotype III compared to I res_V_vs_A <- results(dds, name= "genotypeV.temperature34" ) res_V_vs_A <- res_V_vs_A[order(res_V_vs_A $pvalue),] # Order by pvalue res_V_vs_A # this tests if the condition effect is different in genotype III compared to II res_V_vs_F <- results(dds, contrast=list( "genotypeV.temperature34" , "genotypeF.temperature34" )) res_V_vs_F <- res_V_vs_F[order(res_V_vs_F$pvalue),] # Order by pvalue res_V_vs_F Questions to answer : Why not just use the interaction term in the design? The interaction term will not have the same meaning as it would if both main effects were included in the model. It actually allows us to make contrasts down the line. Why use an interaction term instead of grouping the factors? Are we actually seeing a difference in gene expression across genotypes based on the condition? What does log2FoldChange mean and represent within the design? This value indicates how much the gene or transcript's expression seems to have changed between the comparison and control groups.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Note on Factoring and results order:

            By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels.

            https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

            Show
            Mdavis4290 Molly Davis added a comment - - edited Note on Factoring and results order : By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Yes, if you use "levels=c("V","A") then "V" will get treated as the base level and "A" will get treated as a variable that elevates or depresses expression relative to the "V" base.

            So far as I know....this seems like it works OK based on my experience comparing A and V in the Markdown we were talking about this morning.

            Show
            ann.loraine Ann Loraine added a comment - - edited Yes, if you use "levels=c("V","A") then "V" will get treated as the base level and "A" will get treated as a variable that elevates or depresses expression relative to the "V" base. So far as I know....this seems like it works OK based on my experience comparing A and V in the Markdown we were talking about this morning.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Understanding Designs and Results:

            FindMutantVsWildtypeDEGenes-DESeq2.Rmd

            • Goal: Find and interpret how gene expression in are differs from gene expression in VF36.
            • Interpretation: the design would only look at the genotypes V (wild type) & A (Mutant). The design in the file includes 'time' but I don't think it is needed because the contrast just includes a genotype comparison for the results.

              contrast=c("genotype","A","V")

            FindTreatmentEffectAcrossGenotypes-DESeq2.Rmd ***Molly new Markdown for IGBF-3436

            • Goal: Is the condition effect different across genotypes V (wild type) & A (Mutant)?
            • Interpretation: the design would need to include temperature and genotype but also an interaction term. With the interaction term we wouldn't need to use 'contrast' but instead 'name' to see results because it already performed the contrast with the interaction term added.

              name="genotypeA.temperature34"

            • Should I review data at different time chunks still? In other words, do we need a result file for each time point.
            Show
            Mdavis4290 Molly Davis added a comment - - edited Understanding Designs and Results : FindMutantVsWildtypeDEGenes-DESeq2.Rmd Goal: Find and interpret how gene expression in are differs from gene expression in VF36. Interpretation: the design would only look at the genotypes V (wild type) & A (Mutant). The design in the file includes 'time' but I don't think it is needed because the contrast just includes a genotype comparison for the results. contrast=c("genotype","A","V") FindTreatmentEffectAcrossGenotypes-DESeq2.Rmd ***Molly new Markdown for IGBF-3436 Goal: Is the condition effect different across genotypes V (wild type) & A (Mutant)? Interpretation: the design would need to include temperature and genotype but also an interaction term. With the interaction term we wouldn't need to use 'contrast' but instead 'name' to see results because it already performed the contrast with the interaction term added. name="genotypeA.temperature34" Should I review data at different time chunks still? In other words, do we need a result file for each time point.

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: