Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3507

Re-run Nextflow Muday time course data with SL4 and data downloaded from SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      SRP460750

      Directory: /projects/tomato_genome/fnb/dataprocessing/SRP460750/

      Only SL5 was rerun with the SRA data and SL4 needs to be run with data as well.

      For this task, we need to confirm and sanity-check the muday time course data that Rob recently uploaded and submitted to the Sequence Read Archive.
      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly5-flavonoid-rnaseq/branch/IGBF-3507
            Show
            Mdavis4290 Molly Davis added a comment - PR : https://bitbucket.org/hotpollen/flavonoid-rnaseq/pull-requests/40
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestions for testing:

            • Review the newly added reports and check for problems.
            • Compare this new report to the one we made for the "original" data files - are they consistent? The statistics for the new files should match the statistics for the old ones.
            • However, recall that the "original" data files have the "original" file names assigned by the sequencer. Later we learned that the files were mis-named. We we submitted the files to the SRA, we submitted them using corrected, revised sample names.
            Show
            ann.loraine Ann Loraine added a comment - Suggestions for testing: Review the newly added reports and check for problems. Compare this new report to the one we made for the "original" data files - are they consistent? The statistics for the new files should match the statistics for the old ones. However, recall that the "original" data files have the "original" file names assigned by the sequencer. Later we learned that the files were mis-named. We we submitted the files to the SRA, we submitted them using corrected, revised sample names.
            Hide
            ann.loraine Ann Loraine added a comment -

            Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not?

            If not, it would be good to do that now.

            The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, or if something went wrong with the sample switching, the QC report will likely show it.

            Show
            ann.loraine Ann Loraine added a comment - Molly Davis - please see above comment on how to test. I don't know if you have already compared the files or not? If not, it would be good to do that now. The QC reports provide a great overview of a data processing run. Comparing the QC reports pre- and post-SRA submission will tell us a lot. For example, if there are a big differences between the pre- and post-SRA submission files, or if something went wrong with the sample switching, the QC report will likely show it.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Testing:

            • Compared Mulitqc reports from original muday SL4 data to this rerun SL4 report
            • The reports are not exactly the same mapping numbers but are close enough that there is no cause for concern. This could be because of sample switching in the original data because the report plots and averages have the same overall patterns.
            • I noticed that the original muday mutliqc reports for SL4 and SL5 are not in the flavonoid repo so I am going to add them to this ticket just incase because the data has sample switching which is why I guess we don't want them officially in the flavonoid repo.

            Moving to done!

            Show
            Mdavis4290 Molly Davis added a comment - - edited Testing : Compared Mulitqc reports from original muday SL4 data to this rerun SL4 report The reports are not exactly the same mapping numbers but are close enough that there is no cause for concern. This could be because of sample switching in the original data because the report plots and averages have the same overall patterns. I noticed that the original muday mutliqc reports for SL4 and SL5 are not in the flavonoid repo so I am going to add them to this ticket just incase because the data has sample switching which is why I guess we don't want them officially in the flavonoid repo. Moving to done!

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: