[IGBF-3050] Answer question regarding NA values - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
1
Epic Link:
Support NSF pollen grant

Description

On 2022-01-06, Gloria Muday wrote the following to Ann & Rob, cc-ing Devin Smith (desmith@wfu.edu);

Rob and Ann,

Happy New Year!

I hope you are staying well. I saw that UNC Charlotte's classes are remote for most of January. I hope that does not complicate things too much and keeps everyone safe on your campus.

Devin and I have one question that we need an answer to to continue identifying DE genes from the "are" RNA Seq dataset. Some of the adjusted pvalues came out as NA. We don't understand why that is and what it means. Can you all explain that to use? We prefer to use the adjusted pvalues to filter the dataset, but cannot with some values (with low pvalues) showing up as NA.

Thanks,
Gloria
–
Gloria Muday
Director, Center for Molecular Signaling
Charles M. Allen, Professor of Biology
Wake Forest University
336-758-5316

Loraine replied:

Sure, we can look into it!

Rob, can you send me the script you used to create the data sheet – or (even better!) add it to the repository?

For this task, let's examine the data file output by Rob's code to determine the meaning of the NA values.

Also, we should put the code into the project repository.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

file.txt
0.3 kB
06/Jan/22 1:46 PM
metadata.csv
0.5 kB
06/Jan/22 1:46 PM
volcanoPlots-salmon.R
10 kB
06/Jan/22 1:46 PM

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ann Loraine added a comment - 06/Jan/22 1:44 PM

Rob's reply:

To answer the question as to why we get NA, it is part of how deseq2 handles some situations.

If there are lots of zero's in the data, it will be turned into NA. (likely)
If there is an "extreme" data point, it might be turned into an NA. (less likely)
Low counts after normalizing might be turned into an NA. (likely)

From their FAQ:

Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance.
If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

There might be ways to override this and still get the p-value. I am exploring this now.
Rob

Show

Ann Loraine added a comment - 06/Jan/22 1:44 PM Rob's reply: To answer the question as to why we get NA, it is part of how deseq2 handles some situations. If there are lots of zero's in the data, it will be turned into NA. (likely) If there is an "extreme" data point, it might be turned into an NA. (less likely) Low counts after normalizing might be turned into an NA. (likely) From their FAQ: Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons: If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA. If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. There might be ways to override this and still get the p-value. I am exploring this now. Rob

Hide

Permalink

Ann Loraine added a comment - 06/Jan/22 1:47 PM

Rob's email with scripts (attached):

This is the deseq2 R script.

This needs the Salmon output from nextflow and it needs it in the same folder structure as it was created.
The script will read in the files "quant.sf" tximport as long as the folders are the same as they were originally generated.

And depending on the comparison being done, a metafile is needed. Attached is a metadata.csv where I make a new file metafile containing the conditions I care about.

Rob

Show

Ann Loraine added a comment - 06/Jan/22 1:47 PM Rob's email with scripts (attached): This is the deseq2 R script. This needs the Salmon output from nextflow and it needs it in the same folder structure as it was created. The script will read in the files "quant.sf" tximport as long as the folders are the same as they were originally generated. And depending on the comparison being done, a metafile is needed. Attached is a metadata.csv where I make a new file metafile containing the conditions I care about. Rob

Hide

Permalink

Ann Loraine added a comment - 17/Jan/22 10:48 PM

Rob has answered the question. Moving to Done.

Show

Ann Loraine added a comment - 17/Jan/22 10:48 PM Rob has answered the question. Moving to Done.

Answer question regarding NA values

Details

Description

Attachments

Attachments

Activity

People

Dates