Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3339

Run relabeled 72-F3H-PollenTube data with Deseq analysis

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      • Edit markdowns to reference correct counts file
      • Create Deseq results into csv files
      • Create Volcano Plots
      • Create PCA plots
      • Remove any previous unwanted results

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Reviewing branch IGBF-3339 from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git.

            Fetched and checked-out branch with:

            git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git
            git fetch molly 
            git checkout IGBF-3339
            

            Output:

            Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'.
            Switched to a new branch 'IGBF-3339'
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Reviewing branch IGBF-3339 from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git. Fetched and checked-out branch with: git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git git fetch molly git checkout IGBF-3339 Output: Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'. Switched to a new branch 'IGBF-3339'
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Comments and requests for changes:

            • Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue."
            • The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library."
            • Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is!
            • There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows.
            • Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo.
            • It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting.
            • We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.
            Show
            ann.loraine Ann Loraine added a comment - - edited Comments and requests for changes: Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue." The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library." Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is! There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows. Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo. It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting. We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis."

            Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis.

            I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality.

            For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like:

            1) gene_name
            2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15)
            3) group2 (the group considered as the treatment, e.g., the heat-stressed sample)
            4) p-value
            5) Q (false discovery rate, i.e., an adjusted p-value)

            This file will be quite large, so let's gzip-compress it.
            The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps.

            These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives.

            This secondary file, created to facilitate interactive visualization, would look like:

            1) gene_name - hyperlinked to the gene's location in IGB
            2) group1 (e.g., A.28.15)
            3) group2 (e.g., A.34.15)
            4) group1 average, computed from the scaled counts file values
            5) group1 variance, computed from the scaled counts file values
            6) group2 average, computed from the scaled counts file values
            7) group2 variance, computed from the scaled counts file values
            8) Q (the false discovery rate for the differential gene expression result)
            9) gene description

            In this latter file, we'll show numeric values rounded to 3 significant digits.

            Show
            ann.loraine Ann Loraine added a comment - - edited Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis." Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis. I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality. For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like: 1) gene_name 2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15) 3) group2 (the group considered as the treatment, e.g., the heat-stressed sample) 4) p-value 5) Q (false discovery rate, i.e., an adjusted p-value) This file will be quite large, so let's gzip-compress it. The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps. These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives. This secondary file, created to facilitate interactive visualization, would look like: 1) gene_name - hyperlinked to the gene's location in IGB 2) group1 (e.g., A.28.15) 3) group2 (e.g., A.34.15) 4) group1 average, computed from the scaled counts file values 5) group1 variance, computed from the scaled counts file values 6) group2 average, computed from the scaled counts file values 7) group2 variance, computed from the scaled counts file values 8) Q (the false discovery rate for the differential gene expression result) 9) gene description In this latter file, we'll show numeric values rounded to 3 significant digits.
            Hide
            ann.loraine Ann Loraine added a comment -

            Upon discussion between [~aloraine] and [~molly], we decided to merge the branch and create a new Markdown that addresses the above comments.

            Show
            ann.loraine Ann Loraine added a comment - Upon discussion between [~aloraine] and [~molly] , we decided to merge the branch and create a new Markdown that addresses the above comments.
            Show
            Mdavis4290 Molly Davis added a comment - Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/17

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: