Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3339

Run relabeled 72-F3H-PollenTube data with Deseq analysis

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      • Edit markdowns to reference correct counts file
      • Create Deseq results into csv files
      • Create Volcano Plots
      • Create PCA plots
      • Remove any previous unwanted results

        Attachments

          Issue Links

            Activity

            Mdavis4290 Molly Davis created issue -
            Mdavis4290 Molly Davis made changes -
            Field Original Value New Value
            Epic Link IGBF-3277 [ 22158 ]
            Mdavis4290 Molly Davis made changes -
            Summary Run renamed 72-F3H-PollenTube data with Deseq analysis Run relabeled 72-F3H-PollenTube data with Deseq analysis
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-flavonoid-rnaseq/branch/IGBF-3339 Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/16
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Comparing main branch to new branch by making two clones of the repo onto my laptop. That way, I can look at both in two different windows at the same time.

            Show
            ann.loraine Ann Loraine added a comment - Comparing main branch to new branch by making two clones of the repo onto my laptop. That way, I can look at both in two different windows at the same time.
            ann.loraine Ann Loraine made changes -
            Comment [ Potential interesting thing to look at:

            The mislabeling was not random; there was a pattern. Discovering this pattern enabled Muday lab to re-assign names to the correct samples.
            Thus, the incorrect results may have a pattern to them which we might not have noticed previously but now may be obvious. ]
            Hide
            ann.loraine Ann Loraine added a comment -

            I looked at a one of the .Rmd files. Introduction section, Results section, etc. are incomplete.

            Next step for this ticket should be completing the Markdown and create a coherent, easy-to-understand report from each one.

            In addition, it would be smart to include the commit hash of the repository as it existed when the code was run.
            This should appear in the top of the Results section, or in the Introduction section, to prove which data input file was used.

            This would be in lieu of providing a URL linking to the dataset used. We can't really provide such a link anyway because the data file is packaged together with the code, in the same repository.

            Moving this back to "To-Do" for [~molly] to complete.

            Show
            ann.loraine Ann Loraine added a comment - I looked at a one of the .Rmd files. Introduction section, Results section, etc. are incomplete. Next step for this ticket should be completing the Markdown and create a coherent, easy-to-understand report from each one. In addition, it would be smart to include the commit hash of the repository as it existed when the code was run. This should appear in the top of the Results section, or in the Introduction section, to prove which data input file was used. This would be in lieu of providing a URL linking to the dataset used. We can't really provide such a link anyway because the data file is packaged together with the code, in the same repository. Moving this back to "To-Do" for [~molly] to complete.
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Adding new analysis that uses EdgeR library to detect temperature-dependent expression changes, as a support or backup for the DESeq library results.

            Links

            Show
            ann.loraine Ann Loraine added a comment - - edited Adding new analysis that uses EdgeR library to detect temperature-dependent expression changes, as a support or backup for the DESeq library results. Links https://research.stowers.org/cws/CompGenomics/Projects/edgeR.html https://www.jove.com/v/62528/three-differential-expression-analysis-methods-for-rna-sequencing
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Requests:

            • Please include the plots in the knitted Markdown's output, not as separate files

            Reasons:

            • It is impossible to correctly interpret the plots out of context.
            • It is too easy to lose track of key plot parameters and input datasets used if plots are saved in files separate from the .Rmd file(s) used to create them.
            • Lots of individual files make the repository hard to manage and understand.
            Show
            ann.loraine Ann Loraine added a comment - - edited Requests: Please include the plots in the knitted Markdown's output, not as separate files Reasons: It is impossible to correctly interpret the plots out of context. It is too easy to lose track of key plot parameters and input datasets used if plots are saved in files separate from the .Rmd file(s) used to create them. Lots of individual files make the repository hard to manage and understand.
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] To-Do [ 10305 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3342 [ IGBF-3342 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Sprint Summer 1 2023 May 15 [ 170 ] Summer 1 2023 May 15, Summer 2 2023 May 29 [ 170, 171 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Updated markdowns and results

            Branch: https://bitbucket.org/mdavis4290/molly-flavonoid-rnaseq/branch/IGBF-3339

            Markdown PDF Names:

            • DESeq-ARE-CvsT.pdf
            • DESeq-VF36-CvsT.pdf
            • DESeq-OE3-CvsT.pdf
            • DESeq-Clusters.pdf
            Show
            Mdavis4290 Molly Davis added a comment - - edited Updated markdowns and results Branch : https://bitbucket.org/mdavis4290/molly-flavonoid-rnaseq/branch/IGBF-3339 Markdown PDF Names: DESeq-ARE-CvsT.pdf DESeq-VF36-CvsT.pdf DESeq-OE3-CvsT.pdf DESeq-Clusters.pdf
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Reviewing branch IGBF-3339 from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git.

            Fetched and checked-out branch with:

            git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git
            git fetch molly 
            git checkout IGBF-3339
            

            Output:

            Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'.
            Switched to a new branch 'IGBF-3339'
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Reviewing branch IGBF-3339 from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git. Fetched and checked-out branch with: git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git git fetch molly git checkout IGBF-3339 Output: Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'. Switched to a new branch 'IGBF-3339'
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Comments and requests for changes:

            • Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue."
            • The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library."
            • Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is!
            • There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows.
            • Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo.
            • It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting.
            • We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.
            Show
            ann.loraine Ann Loraine added a comment - - edited Comments and requests for changes: Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue." The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library." Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is! There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows. Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo. It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting. We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis."

            Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis.

            I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality.

            For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like:

            1) gene_name
            2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15)
            3) group2 (the group considered as the treatment, e.g., the heat-stressed sample)
            4) p-value
            5) Q (false discovery rate, i.e., an adjusted p-value)

            This file will be quite large, so let's gzip-compress it.
            The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps.

            These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives.

            This secondary file, created to facilitate interactive visualization, would look like:

            1) gene_name - hyperlinked to the gene's location in IGB
            2) group1 (e.g., A.28.15)
            3) group2 (e.g., A.34.15)
            4) group1 average, computed from the scaled counts file values
            5) group1 variance, computed from the scaled counts file values
            6) group2 average, computed from the scaled counts file values
            7) group2 variance, computed from the scaled counts file values
            8) Q (the false discovery rate for the differential gene expression result)
            9) gene description

            In this latter file, we'll show numeric values rounded to 3 significant digits.

            Show
            ann.loraine Ann Loraine added a comment - - edited Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis." Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis. I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality. For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like: 1) gene_name 2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15) 3) group2 (the group considered as the treatment, e.g., the heat-stressed sample) 4) p-value 5) Q (false discovery rate, i.e., an adjusted p-value) This file will be quite large, so let's gzip-compress it. The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps. These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives. This secondary file, created to facilitate interactive visualization, would look like: 1) gene_name - hyperlinked to the gene's location in IGB 2) group1 (e.g., A.28.15) 3) group2 (e.g., A.34.15) 4) group1 average, computed from the scaled counts file values 5) group1 variance, computed from the scaled counts file values 6) group2 average, computed from the scaled counts file values 7) group2 variance, computed from the scaled counts file values 8) Q (the false discovery rate for the differential gene expression result) 9) gene description In this latter file, we'll show numeric values rounded to 3 significant digits.
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Upon discussion between [~aloraine] and [~molly], we decided to merge the branch and create a new Markdown that addresses the above comments.

            Show
            ann.loraine Ann Loraine added a comment - Upon discussion between [~aloraine] and [~molly] , we decided to merge the branch and create a new Markdown that addresses the above comments.
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Show
            Mdavis4290 Molly Davis added a comment - Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/17
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3355 [ IGBF-3355 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: