[IGBF-3278] Muday-144 PCA Plots - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
4
Epic Link:
Analyze Muday Lab time course data
Sprint:
Spring 4 2023 Feb 21, Spring 5 2023 Mar 6

Description

1. PCA plots for all 3 genotypes at control and stress temperatures in each time point (so, this would be 4 PCA plots). Rob had sent graphs with all 4 time points, 3 genotypes and 2 temperatures and that was hard to parse out. Rasha showed her data with individual time points and it was much easier to follow.

Attachments

Issue Links

is blocked by

IGBF-3281 Muday-144 DE analysis & Volcano Plots

Closed

relates to

IGBF-3287 Create MDS plots for Muday lab's new data

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Molly Davis added a comment - 24/Feb/23 10:38 AM - edited

The following is the location of the R Markdown, updated data files, and HTML for the markdown:
https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/src/main/muday-144-analysis/

Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/8130c485f27029d6ece7f97adb47212d426d6652

Next Steps: Work on making functions and making PCA plots for specific genotypes alone.

Show

Molly Davis added a comment - 24/Feb/23 10:38 AM - edited The following is the location of the R Markdown, updated data files, and HTML for the markdown: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/src/main/muday-144-analysis/ Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/8130c485f27029d6ece7f97adb47212d426d6652 Next Steps: Work on making functions and making PCA plots for specific genotypes alone.

Hide

Permalink

Molly Davis added a comment - 27/Feb/23 2:37 PM - edited

Update:

Created a function and separated genotypes to analyze individually. Made PCA Plots export as pdf file.

Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/8c73d9a711845b770384d811d45ef0d485a8ee16

Next Steps: Get feedback and determine if I need to parse timepoints out also to analyze. Am I fulfilling the task that Gloria is requesting, view ticket description?

Review Notes: Dr. Reid said the IQR request is fine at 50 and doesn't need to be changed to more favorable results. The third request, IGBF-3280, actually might answer my question about continuing with grabbing just the timepoints columns to compare?:

"Request 3. Analyses of DE genes between genotypes. We have not yet seen those genes for the current analysis pipeline and want to match those up with genes in Number 2. Since there are 3 genotypes, I guess the way this needs to be done is 3 pairwise comparisons at each temperature (so 6 comparisons), but if you all have thoughts on another way, we'd love to hear them."

I appreciate any feedback! Let me know what you think!

Show

Molly Davis added a comment - 27/Feb/23 2:37 PM - edited Update: Created a function and separated genotypes to analyze individually. Made PCA Plots export as pdf file. Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/8c73d9a711845b770384d811d45ef0d485a8ee16 Next Steps: Get feedback and determine if I need to parse timepoints out also to analyze. Am I fulfilling the task that Gloria is requesting, view ticket description? Review Notes: Dr. Reid said the IQR request is fine at 50 and doesn't need to be changed to more favorable results. The third request, IGBF-3280, actually might answer my question about continuing with grabbing just the timepoints columns to compare?: "Request 3. Analyses of DE genes between genotypes. We have not yet seen those genes for the current analysis pipeline and want to match those up with genes in Number 2. Since there are 3 genotypes, I guess the way this needs to be done is 3 pairwise comparisons at each temperature (so 6 comparisons), but if you all have thoughts on another way, we'd love to hear them." I appreciate any feedback! Let me know what you think!

Hide

Permalink

Ann Loraine added a comment - 28/Feb/23 11:09 AM - edited

PR merged.
See repository for code.
Repository address is https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/.

Show

Ann Loraine added a comment - 28/Feb/23 11:09 AM - edited PR merged. See repository for code. Repository address is https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ .

Hide

Permalink

Ann Loraine added a comment - 02/Mar/23 9:23 AM - edited

Review comments on Markdown:

One of the first code chunks references absolute file paths on a specific computer. As a result, the Markdown can only run on that specific computer. Instead, the files should be referenced relative to the current working directory. To make this easier, RStudio has a concept of "project". Do this: 1) Create a new project (R project file with .rproj extension) in the same directory and add the project file to the directory. 2) Change the absolute file paths (referencing Molly's file system) to omit the file path. 3) To run the Markdown, users will open the project in RStudio, select the Markdown from the Files section in RStudio, and "knit" it. They can also run individual sections interactively within RStudio. If they do this, there will be no need to provide the absolute path location for input files. Talk with Ann for more information if this does not make sense.
Markdown should contain sections similar to a scientific paper: Introduction, Analysis/Results, Discussion, Conclusion. See https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/Starter_Markdown_Example.Rmd. You can use this as a "starter Markdown" and customize the sections as needed to fit an analysis.
Each figure is a "result" and needs an explanation of what it shows, some text that states "The preceding figure shows that [ ... fill in whatever it shows ... ]."
First-pass interpretation : I see no separations between points except for in the first PCA plot, the one that includes all the samples. This plot seems to show a strong separation between genotypes, with ARE samples separated from the others. However, because each individual sample is being shown with its own unique color, I find the plot very difficult to process. I can see there is a separation of points in that plot, but I can't notice any variables that are very different between the two areas of points. Instead of individually color-code each point, use shapes to indicate genotype and color to indicate temperature - blue for cool and orange for warm. For shapes, I would use a triangle for ARE, a circle for VF36, and a square for OE3. ARE is a point mutant, and triangles are sometimes used to indicate mutations in some types of graphics. VF36 is a sort of parent, or mother variety, so it should be round, like the Venus Venus of Willendorf. The OE3 should be a shape with right angles, like a lab bench, as it was created in a laboratory.
Because the PCA may not be able to detect or group samples very effectively in this experiment, I'd like to try an alternative clustering method called "MDS" for "multi-dimensional scaling". I did this for another Muday lab RNA-Seq dataset, and the knitted Markdown is here: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/Counts/ClusterCounts.html. I made a new ticket for the task : ~~IGBF-3287~~

Show

Ann Loraine added a comment - 02/Mar/23 9:23 AM - edited Review comments on Markdown: One of the first code chunks references absolute file paths on a specific computer. As a result, the Markdown can only run on that specific computer. Instead, the files should be referenced relative to the current working directory. To make this easier, RStudio has a concept of "project". Do this: 1) Create a new project (R project file with .rproj extension) in the same directory and add the project file to the directory. 2) Change the absolute file paths (referencing Molly's file system) to omit the file path. 3) To run the Markdown, users will open the project in RStudio, select the Markdown from the Files section in RStudio, and "knit" it. They can also run individual sections interactively within RStudio. If they do this, there will be no need to provide the absolute path location for input files. Talk with Ann for more information if this does not make sense. Markdown should contain sections similar to a scientific paper: Introduction, Analysis/Results, Discussion, Conclusion. See https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/Starter_Markdown_Example.Rmd . You can use this as a "starter Markdown" and customize the sections as needed to fit an analysis. Each figure is a "result" and needs an explanation of what it shows, some text that states "The preceding figure shows that [ ... fill in whatever it shows ... ]." First-pass interpretation : I see no separations between points except for in the first PCA plot, the one that includes all the samples. This plot seems to show a strong separation between genotypes, with ARE samples separated from the others. However, because each individual sample is being shown with its own unique color, I find the plot very difficult to process. I can see there is a separation of points in that plot, but I can't notice any variables that are very different between the two areas of points. Instead of individually color-code each point, use shapes to indicate genotype and color to indicate temperature - blue for cool and orange for warm. For shapes, I would use a triangle for ARE, a circle for VF36, and a square for OE3. ARE is a point mutant, and triangles are sometimes used to indicate mutations in some types of graphics. VF36 is a sort of parent, or mother variety, so it should be round, like the Venus Venus of Willendorf. The OE3 should be a shape with right angles, like a lab bench, as it was created in a laboratory. Because the PCA may not be able to detect or group samples very effectively in this experiment, I'd like to try an alternative clustering method called "MDS" for "multi-dimensional scaling". I did this for another Muday lab RNA-Seq dataset, and the knitted Markdown is here: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/Counts/ClusterCounts.html . I made a new ticket for the task : IGBF-3287

Hide

Permalink

Molly Davis added a comment - 13/Mar/23 11:14 AM - edited

Updated Bitbucket

Repository: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/src/main/

Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/67b6a52222e71dd5dfd4f721c82cf8b88ebd4769

Let me know if I should make Pull Request to merge with Main! Thanks! [~aloraine]

Show

Molly Davis added a comment - 13/Mar/23 11:14 AM - edited Updated Bitbucket Repository: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/src/main/ Commit: https://bitbucket.org/mdavis4290/analysis-flavonoid-rnaseq/commits/67b6a52222e71dd5dfd4f721c82cf8b88ebd4769 Let me know if I should make Pull Request to merge with Main! Thanks! [~aloraine]

Hide

Permalink

Ann Loraine added a comment - 14/Mar/23 9:50 AM - edited

Please see notes on bitbucket for change request details / questions.

Show

Ann Loraine added a comment - 14/Mar/23 9:50 AM - edited Please see notes on bitbucket for change request details / questions.

Hide

Permalink

Ann Loraine added a comment - 15/Mar/23 11:04 AM

Merged changes into main branch of team repository and then added 3 new commits to address comments on above commit. Moving to Done.

Show

Ann Loraine added a comment - 15/Mar/23 11:04 AM Merged changes into main branch of team repository and then added 3 new commits to address comments on above commit. Moving to Done.

People

Assignee:

Molly Davis

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

23/Feb/23 2:41 PM

Updated:

15/Mar/23 11:05 AM

Resolved:

15/Mar/23 11:05 AM