Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3639

Compare original mark-2022-timeseries data to SRA Re-run data

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Data has been submitted to SRA and now needs to be reviewed. To do this we perform a rerun of the data by pulling it from SRA directly and pushing it through the nextflow pipeline and prepping it for IGB quick load. To confirm that the data on SRA is correct we can make a few comparisons with the original data.

      • Compare files sizes of the original data to the new rerun file sizes.
      • Open IGB and compare bam and coverage graphs with the original and rerun data.
      • Perform these comparisons SL4 and SL5

      mark-2022-timeseries cluster Directories:

      Original: /projects/tomato_genome/fnb/dataprocessing/mark-2022-timeseries
      Re-run: /projects/tomato_genome/fnb/dataprocessing/SRP441343

        Attachments

          Issue Links

            Activity

            Mdavis4290 Molly Davis created issue -
            Mdavis4290 Molly Davis made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3596 [ IGBF-3596 ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Sprint Spring 5 [ 189 ]
            Mdavis4290 Molly Davis made changes -
            Sprint Spring 7 [ 191 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Use this script 72_F3H_PollenTube/CheckSRA.Rmd and adapt it to work with Mark's data.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Use this script 72_F3H_PollenTube/CheckSRA.Rmd and adapt it to work with Mark's data.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Branch: https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3639

            Notes: I went ahead and adapted the compare Original to SRA submission script to mark's data. I removed any code associated with the sample switching due to mark's data not having this issue. From what I can tell there is not output from the code that says the SRA submission is wrong for marks data. Review from someone will confirm this though to make sure my code is producing accurate results. This markdown is also pretty easy to use for future comparisons as the only thing that needs to be changed are the file names in the beginning.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3639 Notes : I went ahead and adapted the compare Original to SRA submission script to mark's data. I removed any code associated with the sample switching due to mark's data not having this issue. From what I can tell there is not output from the code that says the SRA submission is wrong for marks data. Review from someone will confirm this though to make sure my code is producing accurate results. This markdown is also pretty easy to use for future comparisons as the only thing that needs to be changed are the file names in the beginning.
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Reviewer:

            • Please review the markdown CheckSRA-mark-2022-timeseries.Rmd
            • Make sure all of the code runs on your machine
            • Make sure all of marks data is present on your machine for the script to work (might need to clone my fork) or download the files from the branch link above.
            • If the code runs successfully look at CheckSRA-mark-2022-timeseries.html to make sure the output results say there are zero mismatches with the SRA submission
            Show
            Mdavis4290 Molly Davis added a comment - - edited Reviewer : Please review the markdown CheckSRA-mark-2022-timeseries.Rmd Make sure all of the code runs on your machine Make sure all of marks data is present on your machine for the script to work (might need to clone my fork) or download the files from the branch link above. If the code runs successfully look at CheckSRA-mark-2022-timeseries.html to make sure the output results say there are zero mismatches with the SRA submission
            Mdavis4290 Molly Davis made changes -
            Assignee Robert Reid [ robertreid ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3693 [ IGBF-3693 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 7 [ 191 ] Spring 7, Spring 8 [ 191, 192 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            robofjoy Robert Reid added a comment -

            I have cloned the repo and ran it in R-studio.

            Knit was a success! Resulting doc was created.

            No errors or false returns!
            I'd say this passes review.

            Show
            robofjoy Robert Reid added a comment - I have cloned the repo and ran it in R-studio. Knit was a success! Resulting doc was created. No errors or false returns! I'd say this passes review.
            robofjoy Robert Reid made changes -
            Assignee Robert Reid [ robertreid ] Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Mdavis4290 Molly Davis made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Show
            Mdavis4290 Molly Davis added a comment - PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/15
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged.

            Show
            ann.loraine Ann Loraine added a comment - PR is merged.
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            To test, I am opening the knitted Markdown and reading the contents.

            Show
            ann.loraine Ann Loraine added a comment - To test, I am opening the knitted Markdown and reading the contents.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            The HTML report mentions that some of the samples did not match. This was not the case for this particular data set. If anyone reads this, they will get very confused and possibly even a little upset because it suggests that we noticed a problem with this data set but actually did not.

            We actually did not notice any such problem with this particularl data set. We are simply re-using some code we developed for another data set and applying it to check for issues with this different data set. If you include the description of the problem found with the other data set, not this one, as the motivation for taking an extra careful look at this one, in this particular way, that's fine. But unless you do that it will make no sense and scare people.

            To fix this, re-write the Introduction to explain what the purpose of the document is: We are using the General Statistics table from our nf-core/rnaseq pipelines pre- and post-SRA submission to compare general statistics profiles for the original, pre-SRA submission data and the post-SRA data. If the profiles do not match, that would suggest there was a problem with the submission. If the do match, then we have no reason to think that there is a problem.

            Another problem: The Results section mentions Muday lab, but these data are not from the Muday lab.

            Also, since no sample switching issues were found, updateSRA.csv contains no data - just a header.

            If not problem was found, why do we need a file whose entire purpose in life is to help correct the problem?

            Lastly, does it make sense to put code in ExternalData?

            I think it would be better to create a new folder named "CheckSRASubmittedData" and move the Markdown (.Rmd), its knitted version (.html), and the copied "General Statistics" table CheckSRA-SL5-mark-2022-timeseries.xlsx, into that new directory.

            Please note: When you move files in "git", use "git mv" so that history will be preserved.

            Show
            ann.loraine Ann Loraine added a comment - - edited The HTML report mentions that some of the samples did not match. This was not the case for this particular data set. If anyone reads this, they will get very confused and possibly even a little upset because it suggests that we noticed a problem with this data set but actually did not. We actually did not notice any such problem with this particularl data set. We are simply re-using some code we developed for another data set and applying it to check for issues with this different data set. If you include the description of the problem found with the other data set, not this one, as the motivation for taking an extra careful look at this one, in this particular way, that's fine. But unless you do that it will make no sense and scare people. To fix this, re-write the Introduction to explain what the purpose of the document is: We are using the General Statistics table from our nf-core/rnaseq pipelines pre- and post-SRA submission to compare general statistics profiles for the original, pre-SRA submission data and the post-SRA data. If the profiles do not match, that would suggest there was a problem with the submission. If the do match, then we have no reason to think that there is a problem. Another problem: The Results section mentions Muday lab, but these data are not from the Muday lab. Also, since no sample switching issues were found, updateSRA.csv contains no data - just a header. If not problem was found, why do we need a file whose entire purpose in life is to help correct the problem? Lastly, does it make sense to put code in ExternalData? I think it would be better to create a new folder named "CheckSRASubmittedData" and move the Markdown (.Rmd), its knitted version (.html), and the copied "General Statistics" table CheckSRA-SL5-mark-2022-timeseries.xlsx, into that new directory. Please note: When you move files in "git", use "git mv" so that history will be preserved.
            ann.loraine Ann Loraine made changes -
            Status Post-merge Testing In Progress [ 10003 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 7, Spring 8 [ 191, 192 ] Spring 7, Spring 8, Spring 9 [ 191, 192, 193 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3639b
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            This file:

            ExternalData/updateSRA.csv

            should not be saved into "ExternalData."

            Instead, it should be written to a sub-folder "results" in the same directory as the .Rmd file.

            Show
            ann.loraine Ann Loraine added a comment - This file: ExternalData/updateSRA.csv should not be saved into "ExternalData." Instead, it should be written to a sub-folder "results" in the same directory as the .Rmd file.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            The updateSRA.csv is not being saved to ExternalData. In the commit I am deleting the file that was originally in that folder and when knitting it should be created in the same directory as the .Rmd file. I didn't add the file updateSRA.csv to the new folder CheckSRASubmittedData because in your comment above you noted we shouldn't add a file that is blank. Let me know if this is still ok. Thanks! Ann Loraine

            Show
            Mdavis4290 Molly Davis added a comment - The updateSRA.csv is not being saved to ExternalData. In the commit I am deleting the file that was originally in that folder and when knitting it should be created in the same directory as the .Rmd file. I didn't add the file updateSRA.csv to the new folder CheckSRASubmittedData because in your comment above you noted we shouldn't add a file that is blank. Let me know if this is still ok. Thanks! Ann Loraine
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            Mdavis4290 Molly Davis made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            This new code is a copy of some other code, and that other code found an error and wrote a file.
            This new code does not find any errors.
            So why does is write a file?
            It does not make a lot of sense to do that.
            It would be better for the new code, which finds no errors, to notice that no errors are found and then NOT write a file named "updateSRA.csv" when the user runs it.
            Please re-read the new code and double-check that what it is doing and what it says makes sense for the new context.
            If that is actually what is happening, please let me know.

            Show
            ann.loraine Ann Loraine added a comment - - edited This new code is a copy of some other code, and that other code found an error and wrote a file. This new code does not find any errors. So why does is write a file? It does not make a lot of sense to do that. It would be better for the new code, which finds no errors, to notice that no errors are found and then NOT write a file named "updateSRA.csv" when the user runs it. Please re-read the new code and double-check that what it is doing and what it says makes sense for the new context. If that is actually what is happening, please let me know.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            Added a new line of code for the CheckSRA markdowns so that if the output file is empty is will not save. I am going to commit the changes for this markdown and the other markdowns as well for muday lab CheckSRA in ticket IGBF-3693

            if(nrow(output_df) > 0) readr::write_csv(output_df, output_fname)
            
            Show
            Mdavis4290 Molly Davis added a comment - Added a new line of code for the CheckSRA markdowns so that if the output file is empty is will not save. I am going to commit the changes for this markdown and the other markdowns as well for muday lab CheckSRA in ticket IGBF-3693 if (nrow(output_df) > 0) readr::write_csv(output_df, output_fname)
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-2-splicing-analysis/branch/IGBF-3639c
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 7, Spring 8, Spring 9 [ 191, 192, 193 ] Spring 7, Spring 8, Spring 9, Spring 10 [ 191, 192, 193, 194 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment -

            I reviewed the new text of the Markdown using the bitbucket interface. It looks like the code and text about the code is no longer referencing the Muday dataset but is instead only talking about the current task, which is making sure that the pre- and post-SRA submission data are cool.

            Please submit PR when ready.

            attn: Molly Davis

            Show
            ann.loraine Ann Loraine added a comment - I reviewed the new text of the Markdown using the bitbucket interface. It looks like the code and text about the code is no longer referencing the Muday dataset but is instead only talking about the current task, which is making sure that the pre- and post-SRA submission data are cool. Please submit PR when ready. attn: Molly Davis
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]
            Show
            Mdavis4290 Molly Davis added a comment - PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/16
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ] Ann Loraine [ aloraine ]
            Mdavis4290 Molly Davis made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            Hide
            ann.loraine Ann Loraine added a comment -

            PR is merged. Moving to DONE.

            Show
            ann.loraine Ann Loraine added a comment - PR is merged. Moving to DONE.
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Molly Davis [ molly ]

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: