Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3348

Create sample sheet with SRA names for mature pollen and seedling dataset

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      0.5
    • Sprint:
      Summer 2 2023 May 29, Summer 3 2023 June 12, Summer 4 2023 June 26, Summer 5 2023 July 10, Summer 6 2023 July 24

      Description

      Now that our pollen and seedling data are submitted to SRA, we are planning to download and re-run our nf-core rnaseq pipeline on the data as it exists in the SRA. Then, we will deploy the newly re-run data to IGB Quickload for users to view and explore. To re-deploy the data we need a new sample sheet that uses SRA names instead of our project-specific names.

      For this task, create a new sample spreadsheet named for the project name which uses the new SRA names.

      See the linked ticket IGBF-3254 for a mapping between original fastq file names and new SRA names.

        Attachments

          Issue Links

            Activity

            Show
            Mdavis4290 Molly Davis added a comment - Pull Request : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/11
            Hide
            ann.loraine Ann Loraine added a comment -

            Testing:

            • I fetched the new branch from [~molly]'s repository
            • I opened the new version of the Excel-format spreadsheet
            • I observed that the "SRP" numbers in column "Study Name" and "Physical Folder" now are all "SRP438952" as requested
            • I observed that the header is now frozen - it remains stationary when I scroll up and down the spreadsheet

            Based on the above observations, I recommend that the branch be merged into "main" as usual.

            Show
            ann.loraine Ann Loraine added a comment - Testing: I fetched the new branch from [~molly] 's repository I opened the new version of the Excel-format spreadsheet I observed that the "SRP" numbers in column "Study Name" and "Physical Folder" now are all "SRP438952" as requested I observed that the header is now frozen - it remains stationary when I scroll up and down the spreadsheet Based on the above observations, I recommend that the branch be merged into "main" as usual.
            Hide
            Mdavis4290 Molly Davis added a comment -
            Show
            Mdavis4290 Molly Davis added a comment - Version 3 of Updated Sample Sheet : Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3348c
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Visited folder https://bitbucket.org/mdavis4290/molly-splicing-analysis/src/IGBF-3348b/ExternalData/ and downloaded proposed new sample sheet, file name: SRP438952_sample_sheet.xlsx.

            Change request(s):

            1) Column "Physical folder" contains nonsensical values. For example, the first three rows are: SRP438952, SRP438953, and SRP438954.
            2) This is a minor issue, but I noticed that some values in column "Color" contain hash characters (#) and others do not. The downstream code (makeAnnotsXml.py and others) can handle both, I think. A hash character is probably good to include as this would make it super-clear that the values are hexadecimal color codes. But I have no preference either way.

            I did not proof-read every row; instead, I mainly just scanned over the values. Apart from the preceding issues, the data are fine.

            For future commits, it sure would be nice if [~molly] could preserve the "freeze top row" formatting. Freezing that top row really helps a lot when interacting with this spreadsheet! It's very nice to have that top row, the row that contains the column names, always stay in view when scrolling through the data.

            Show
            ann.loraine Ann Loraine added a comment - - edited Visited folder https://bitbucket.org/mdavis4290/molly-splicing-analysis/src/IGBF-3348b/ExternalData/ and downloaded proposed new sample sheet, file name: SRP438952_sample_sheet.xlsx. Change request(s): 1) Column "Physical folder" contains nonsensical values. For example, the first three rows are: SRP438952, SRP438953, and SRP438954. 2) This is a minor issue, but I noticed that some values in column "Color" contain hash characters (#) and others do not. The downstream code (makeAnnotsXml.py and others) can handle both, I think. A hash character is probably good to include as this would make it super-clear that the values are hexadecimal color codes. But I have no preference either way. I did not proof-read every row; instead, I mainly just scanned over the values. Apart from the preceding issues, the data are fine. For future commits, it sure would be nice if [~molly] could preserve the "freeze top row" formatting. Freezing that top row really helps a lot when interacting with this spreadsheet! It's very nice to have that top row, the row that contains the column names, always stay in view when scrolling through the data.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited
            Show
            Mdavis4290 Molly Davis added a comment - - edited Version 2 Updated Sample Sheet : Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3348b
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            add 3 columns (after the existing ones):

            • replicate (1, 2, or 3)
            • temperature in degrees C (37 or 28)
            • tissue ("mature pollen" or "seedling")

            change existing columns as follows:

            • Replace values in "Study Name" and "Physical Folder" to the "SRP" accession: SRP438952
            • Replace values in "Sample URL" with https://www.ncbi.nlm.nih.gov/sra
            • Replace values in "Sample code" with the "SRR" values for each row
            • Change values in "treatment" with "heat stress" or "control"
            • Change display name values to "[genotype] [tissue] [temperature] [treatment] [replicate]" e.g, "Heinz seedling 37 heat stress 1"
            Show
            ann.loraine Ann Loraine added a comment - - edited add 3 columns (after the existing ones): replicate (1, 2, or 3) temperature in degrees C (37 or 28) tissue ("mature pollen" or "seedling") change existing columns as follows: Replace values in "Study Name" and "Physical Folder" to the "SRP" accession: SRP438952 Replace values in "Sample URL" with https://www.ncbi.nlm.nih.gov/sra Replace values in "Sample code" with the "SRR" values for each row Change values in "treatment" with "heat stress" or "control" Change display name values to " [genotype] [tissue] [temperature] [treatment] [replicate] " e.g, "Heinz seedling 37 heat stress 1"
            Hide
            Mdavis4290 Molly Davis added a comment - - edited
            Show
            Mdavis4290 Molly Davis added a comment - - edited Version 1 Sample Sheet : Branch : https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3348

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: