Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3454

Develop an R prototype script EB-Seq using Muday time course

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Purpose: Write a first draft of an EB-Seq script to analyze the time course data.

      Why? Because comparing time points violates an assumption of independence, a few tools have been developed to address this and improve the RNA-seq comparisons on time dependent studies.
      A few examples:

      • mfuzz
      • PAL-D
      • EB-Seq

      I tested out PAL-D last January and it was a horrible experience.
      Rasha ran mfuzz with success.
      EB-Seq has been run by me and Liz in the past.

      Liz's details on Eb0seq can be found here:
      _Original details from Liz Cooper
      “EBSeq script on my GitHub page associated with the sweet sorghum paper (in case you decide to use that package): “
      Rio/EBSeqHMM_Cluster.R at master · eacooper400/Rio · GitHub

      Liz_

      In R:

      The R package should be acquired like so:

          1. Installing Required Packages (only done once)
            #source("https://bioconductor.org/biocLite.R")
            BiocManager::install("EBSeq")
            BiocManager::install("EBSeqHMM")
            install.packages("blockmodeling")
          1. Load Required Pacakges
            library(EBSeq)
            library(EBSeqHMM)

        Attachments

          Issue Links

            Activity

            Hide
            robofjoy Robert Reid added a comment - - edited

            2 Rscripts are now just 1.

            This is testable and reviewable!

            Show
            robofjoy Robert Reid added a comment - - edited 2 Rscripts are now just 1. This is testable and reviewable!
            Hide
            robofjoy Robert Reid added a comment - - edited

            For R script, see comments below....... muday-144-SL4_counts-salmon.txt

            Instructions for review:

            1. Download the attached R script. And run it on the attached Salmon counts file.
            2. Open the script in Rstudio and try to run it. A few libraries will need to be installed possibly depending on your environment.
            3. You will need to change the locations on lines 20 and 26 to your folder location.
            4. Takes many minutes to run.
            5. The end result should be a collection of result files that look similar in size to these:
            6. rw-rr-@ 1 robreid staff 63100 Oct 19 10:24 a28gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 63544 Oct 19 10:24 a34gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 49124 Oct 19 10:24 v28gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 49817 Oct 19 10:24 v34gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 40735 Oct 19 10:24 f28gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 42402 Oct 19 10:24 f34gc_ebseqGeneCalls-SL4.txt
              rw-rr-@ 1 robreid staff 483 Oct 19 10:24 a28gc_numIneachPath-SL4.txt
              rw-rr-@ 1 robreid staff 484 Oct 19 10:24 a34gc_numIneachPath-SL4.txt
              rw-rr-@ 1 robreid staff 482 Oct 19 10:24 v28gc_numIneachPath-SL4.txt
              rw-rr-@ 1 robreid staff 483 Oct 19 10:24 v34gc_numIneachPath-SL4.txt
              rw-rr-@ 1 robreid staff 481 Oct 19 10:24 f28gc_numIneachPath-SL4.txt
              rw-rr-@ 1 robreid staff 482 Oct 19 10:24 f34gc_numIneachPath-SL4.txt

            Take a peak at a result file or 2 and see if it is intuitive!

            Show
            robofjoy Robert Reid added a comment - - edited For R script, see comments below....... muday-144-SL4_counts-salmon.txt Instructions for review: Download the attached R script. And run it on the attached Salmon counts file. Open the script in Rstudio and try to run it. A few libraries will need to be installed possibly depending on your environment. You will need to change the locations on lines 20 and 26 to your folder location. Takes many minutes to run. The end result should be a collection of result files that look similar in size to these: rw-r r -@ 1 robreid staff 63100 Oct 19 10:24 a28gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 63544 Oct 19 10:24 a34gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 49124 Oct 19 10:24 v28gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 49817 Oct 19 10:24 v34gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 40735 Oct 19 10:24 f28gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 42402 Oct 19 10:24 f34gc_ebseqGeneCalls-SL4.txt rw-r r -@ 1 robreid staff 483 Oct 19 10:24 a28gc_numIneachPath-SL4.txt rw-r r -@ 1 robreid staff 484 Oct 19 10:24 a34gc_numIneachPath-SL4.txt rw-r r -@ 1 robreid staff 482 Oct 19 10:24 v28gc_numIneachPath-SL4.txt rw-r r -@ 1 robreid staff 483 Oct 19 10:24 v34gc_numIneachPath-SL4.txt rw-r r -@ 1 robreid staff 481 Oct 19 10:24 f28gc_numIneachPath-SL4.txt rw-r r -@ 1 robreid staff 482 Oct 19 10:24 f34gc_numIneachPath-SL4.txt Take a peak at a result file or 2 and see if it is intuitive!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Review:

            • The script runs good in RStudio and ran into no issues.
            • I did have issues with the output files. There is a third column after "Max_PP" that is not named and is full of numbers. I am unsure what those numbers represent.
            • I also try to open the txt files in excel and they do not visually transfer correctly. The strings are mismatched and do not separate correctly.
            • Might be beneficial to create a data frame object for each result and make sure the columns save correctly with column names.

            Thanks!

            Show
            Mdavis4290 Molly Davis added a comment - - edited Review : The script runs good in RStudio and ran into no issues. I did have issues with the output files. There is a third column after "Max_PP" that is not named and is full of numbers. I am unsure what those numbers represent. I also try to open the txt files in excel and they do not visually transfer correctly. The strings are mismatched and do not separate correctly. Might be beneficial to create a data frame object for each result and make sure the columns save correctly with column names. Thanks!
            Hide
            robofjoy Robert Reid added a comment -

            Ebseq-hmm_mudayTimeCourse-Version3.R

            Updated version that handles the headers and keeps things in line for better exporting to Excel.

            Molly can you run this version?

            Show
            robofjoy Robert Reid added a comment - Ebseq-hmm_mudayTimeCourse-Version3.R Updated version that handles the headers and keeps things in line for better exporting to Excel. Molly can you run this version?
            Hide
            Mdavis4290 Molly Davis added a comment -

            Review:

            • The EBSeq output txt files are fixed now and column names are fixed.
            • The numIneachPath txt files still have some formatting issues but that's only if you try to open in excel. These files are readable as just txt files so I see no reason for them needing to be readable in excel also.

            Moving ticket to done!

            Show
            Mdavis4290 Molly Davis added a comment - Review : The EBSeq output txt files are fixed now and column names are fixed. The numIneachPath txt files still have some formatting issues but that's only if you try to open in excel. These files are readable as just txt files so I see no reason for them needing to be readable in excel also. Moving ticket to done!

              People

              • Assignee:
                robofjoy Robert Reid
                Reporter:
                robofjoy Robert Reid
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: