[IGBF-3339] Run relabeled 72-F3H-PollenTube data with Deseq analysis - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Analyze Muday Lab time course data
Sprint:
Summer 1 2023 May 15, Summer 2 2023 May 29

Description

Edit markdowns to reference correct counts file
Create Deseq results into csv files
Create Volcano Plots
Create PCA plots
Remove any previous unwanted results

Attachments

Issue Links

relates to

IGBF-3342 Create scaled counts file for Muday analysis

Closed

IGBF-3355 Create Markdown consolidating DESeq treatment vs control analysis

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Molly Davis added a comment - 15/May/23 11:17 AM

Branch: https://bitbucket.org/mdavis4290/molly-flavonoid-rnaseq/branch/IGBF-3339
Pull Request: https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/16

Show

Molly Davis added a comment - 15/May/23 11:17 AM Branch : https://bitbucket.org/mdavis4290/molly-flavonoid-rnaseq/branch/IGBF-3339 Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/16

5 older comments

Hide

Permalink

Ann Loraine added a comment - 05/Jun/23 3:30 PM - edited

Reviewing branch ~~IGBF-3339~~ from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git.

Fetched and checked-out branch with:

git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git
git fetch molly 
git checkout IGBF-3339

Output:

Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'.
Switched to a new branch 'IGBF-3339'

Show

Ann Loraine added a comment - 05/Jun/23 3:30 PM - edited Reviewing branch IGBF-3339 from repository git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git. Fetched and checked-out branch with: git remote add molly git@bitbucket.org:mdavis4290/molly-flavonoid-rnaseq.git git fetch molly git checkout IGBF-3339 Output: Branch 'IGBF-3339' set up to track remote branch 'IGBF-3339' from 'molly'. Switched to a new branch 'IGBF-3339'

Hide

Permalink

Ann Loraine added a comment - 05/Jun/23 5:12 PM - edited

Comments and requests for changes:

Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue."

The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library."

Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is!

There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows.

Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo.

It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting.

We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.

Show

Ann Loraine added a comment - 05/Jun/23 5:12 PM - edited Comments and requests for changes: Use past tense when discussing results, e.g., "100 genes were up-regulated" instead of "100 genes are up-regulated." Using past tense is better because the experiment is completed, and the Markdown describes what happened in that experiment. Use present tense when talking about an eternal truth, e.g., "the sky is blue." The Introduction needs more focus. If it has more focus, it can be shorter. You can say something like: "The goal of this Markdown is to identify genes whose expression was affected by the treatment." Then you can talk about how you plan to do this. I would keep it very short. Just say: "To achieve this goal, we will use functions from the DESeq library." Imediately following each plot, explain the plot, and only the plot. After each plot, state: "The preceding plot shows" and then state what it shows. Don't include anything else. Currently, the volcano plots are followed with precise counts of DE genes. But the volcano plots actually do ot show precise counts or numbers of genes. They are actually very limited in that they are so dense that you can't actually use them to get counts of DE genes. Let's talk about it on Tuesday. There's a really great trick for describing plots which I will show you. You will be shocked at how simple and easy it is! There is a lot of stuff getting printed into the knitted Markdown Results section that does not appear to be a result. If you include output from code, it's important to tell the reader what that output shows. Do not output plots as individual files. It is too easy for these to get separated from the code that produced them, leading to errors in future steps. Please remove all the output volcano files from the repo. It would be much, much better to do the analysis of all three genotypes in a single file. That way, we won't have to maintain and update three different files that are all repeating the same text. Also, the Markdown could compare the results across the three genotypes, which is useful and interesting. We should consolidate the DE gene expression results for each genotype into one file. I will explain a design for how to this tomorrow.

Hide

Permalink

Ann Loraine added a comment - 05/Jun/23 8:00 PM - edited

Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis."

Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis.

I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality.

For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like:

1) gene_name
2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15)
3) group2 (the group considered as the treatment, e.g., the heat-stressed sample)
4) p-value
5) Q (false discovery rate, i.e., an adjusted p-value)

This file will be quite large, so let's gzip-compress it.
The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps.

These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives.

This secondary file, created to facilitate interactive visualization, would look like:

1) gene_name - hyperlinked to the gene's location in IGB
2) group1 (e.g., A.28.15)
3) group2 (e.g., A.34.15)
4) group1 average, computed from the scaled counts file values
5) group1 variance, computed from the scaled counts file values
6) group2 average, computed from the scaled counts file values
7) group2 variance, computed from the scaled counts file values
8) Q (the false discovery rate for the differential gene expression result)
9) gene description

In this latter file, we'll show numeric values rounded to 3 significant digits.

Show

Ann Loraine added a comment - 05/Jun/23 8:00 PM - edited Looks like all the extra text referred to in the preceding comment is coming from multiple "print" statements in the function "Gentoype_DE_Analysis." Also, there is a lot of code in "Gentoype_DE_Analysis" that seems unneeded for this particular analysis. I think a good way to manage this would be to create a new, single Markdown that uses the basic structure of "FindDifferentiallyExpressedGenes.Rmd", but including volcano plots and using DESeq functionality. For the output, let's create a tab-delimited plain-text table (not an Excel spreadsheet) that looks like: 1) gene_name 2) group1 (the baseline, e.g., the group that was considered the control, expressed as the sample group prefix, such as A.28.15) 3) group2 (the group considered as the treatment, e.g., the heat-stressed sample) 4) p-value 5) Q (false discovery rate, i.e., an adjusted p-value) This file will be quite large, so let's gzip-compress it. The goal of this file is to have a text-file with all the differential expression analysis results, which we can then use in subsequent analysis steps. These subsequent analysis steps would include making a spreadsheet that would allow users to inspect the DE genes in Integrated Genome browser. That file could be an Excel spreadsheet that would contain all rows with Q <= 0.10, a very liberal threshold for deciding differential expression, meaning: many of the results will be false positives. This secondary file, created to facilitate interactive visualization, would look like: 1) gene_name - hyperlinked to the gene's location in IGB 2) group1 (e.g., A.28.15) 3) group2 (e.g., A.34.15) 4) group1 average, computed from the scaled counts file values 5) group1 variance, computed from the scaled counts file values 6) group2 average, computed from the scaled counts file values 7) group2 variance, computed from the scaled counts file values 8) Q (the false discovery rate for the differential gene expression result) 9) gene description In this latter file, we'll show numeric values rounded to 3 significant digits.

Hide

Permalink

Ann Loraine added a comment - 06/Jun/23 12:12 PM

Upon discussion between [~aloraine] and [~molly], we decided to merge the branch and create a new Markdown that addresses the above comments.

Show

Ann Loraine added a comment - 06/Jun/23 12:12 PM Upon discussion between [~aloraine] and [~molly] , we decided to merge the branch and create a new Markdown that addresses the above comments.

Hide

Permalink

Molly Davis added a comment - 06/Jun/23 12:14 PM

Pull Request:

https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/17

Show

Molly Davis added a comment - 06/Jun/23 12:14 PM Pull Request : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/17

People

Assignee:

Molly Davis

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

15/May/23 10:30 AM

Updated:

06/Jun/23 12:28 PM

Resolved:

06/Jun/23 12:28 PM