Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3050

Answer question regarding NA values

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      On 2022-01-06, Gloria Muday wrote the following to Ann & Rob, cc-ing Devin Smith (desmith@wfu.edu);

      Rob and Ann,

      Happy New Year!

      I hope you are staying well. I saw that UNC Charlotte's classes are remote for most of January. I hope that does not complicate things too much and keeps everyone safe on your campus.

      Devin and I have one question that we need an answer to to continue identifying DE genes from the "are" RNA Seq dataset. Some of the adjusted pvalues came out as NA. We don't understand why that is and what it means. Can you all explain that to use? We prefer to use the adjusted pvalues to filter the dataset, but cannot with some values (with low pvalues) showing up as NA.

      Thanks,
      Gloria

      Gloria Muday
      Director, Center for Molecular Signaling
      Charles M. Allen, Professor of Biology
      Wake Forest University
      336-758-5316

      Loraine replied:

      Sure, we can look into it!

      Rob, can you send me the script you used to create the data sheet – or (even better!) add it to the repository?

      For this task, let's examine the data file output by Rob's code to determine the meaning of the NA values.

      Also, we should put the code into the project repository.

        Attachments

        1. file.txt
          0.3 kB
        2. metadata.csv
          0.5 kB
        3. volcanoPlots-salmon.R
          10 kB

          Activity

          Hide
          ann.loraine Ann Loraine added a comment -

          Rob's reply:

          To answer the question as to why we get NA, it is part of how deseq2 handles some situations.

          If there are lots of zero's in the data, it will be turned into NA. (likely)
          If there is an "extreme" data point, it might be turned into an NA. (less likely)
          Low counts after normalizing might be turned into an NA. (likely)

          From their FAQ:

          Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

          If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
          If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance.
          If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

          There might be ways to override this and still get the p-value. I am exploring this now.
          Rob

          Show
          ann.loraine Ann Loraine added a comment - Rob's reply: To answer the question as to why we get NA, it is part of how deseq2 handles some situations. If there are lots of zero's in the data, it will be turned into NA. (likely) If there is an "extreme" data point, it might be turned into an NA. (less likely) Low counts after normalizing might be turned into an NA. (likely) From their FAQ: Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons: If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA. If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. There might be ways to override this and still get the p-value. I am exploring this now. Rob
          Hide
          ann.loraine Ann Loraine added a comment -

          Rob's email with scripts (attached):

          This is the deseq2 R script.

          This needs the Salmon output from nextflow and it needs it in the same folder structure as it was created.
          The script will read in the files "quant.sf" tximport as long as the folders are the same as they were originally generated.

          And depending on the comparison being done, a metafile is needed. Attached is a metadata.csv where I make a new file metafile containing the conditions I care about.

          Rob

          Show
          ann.loraine Ann Loraine added a comment - Rob's email with scripts (attached): This is the deseq2 R script. This needs the Salmon output from nextflow and it needs it in the same folder structure as it was created. The script will read in the files "quant.sf" tximport as long as the folders are the same as they were originally generated. And depending on the comparison being done, a metafile is needed. Attached is a metadata.csv where I make a new file metafile containing the conditions I care about. Rob
          Hide
          ann.loraine Ann Loraine added a comment -

          Rob has answered the question. Moving to Done.

          Show
          ann.loraine Ann Loraine added a comment - Rob has answered the question. Moving to Done.

            People

            • Assignee:
              Unassigned
              Reporter:
              ann.loraine Ann Loraine
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: