Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3326

Create KP 2023 sample sheet needed for Quickload site and more

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Original Directory: /projects/tomato_genome/rnaseq/30-804059537-kelsie
      Nf-core run: /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            To see a listing of all the "bam" files and their file name prefixes, see:

            http://lorainelab-quickload.scidas.org/hotpollen/S_lycopersicum_Jun_2022/pistil/

            Show
            ann.loraine Ann Loraine added a comment - To see a listing of all the "bam" files and their file name prefixes, see: http://lorainelab-quickload.scidas.org/hotpollen/S_lycopersicum_Jun_2022/pistil/
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Existing Excel sample sheets:

            Also,

            • please review prior tickets where we made samples sheets for other datasets.
            • create a more complete list of sample sheets from other data sets and repositories
            Show
            ann.loraine Ann Loraine added a comment - - edited Existing Excel sample sheets: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ExternalDataSets/muday-144_sample_sheet.xlsx https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/SRP252265_SRP328042_sample_sheet.xlsx Also, please review prior tickets where we made samples sheets for other datasets. create a more complete list of sample sheets from other data sets and repositories
            Hide
            ann.loraine Ann Loraine added a comment -

            Also, there's a script that reads these files and does a thing - it makes "annots.xml" metadata file for deploying the data in IGB for visualization.

            This script is:

            Please look / understand the repository genome-browser-visualization.

            Show
            ann.loraine Ann Loraine added a comment - Also, there's a script that reads these files and does a thing - it makes "annots.xml" metadata file for deploying the data in IGB for visualization. This script is: https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/makeAnnotsXml.py Please look / understand the repository genome-browser-visualization.
            Hide
            Mdavis4290 Molly Davis added a comment -

            First draft sample sheet:
            [^Draft1_KP_sample_sheet.xlsx]

            Show
            Mdavis4290 Molly Davis added a comment - First draft sample sheet: [^Draft1_KP_sample_sheet.xlsx]
            Hide
            ann.loraine Ann Loraine added a comment -

            Change requests:

            1)

            Remove "h" from values in "hours" column (no need to specify the units because the column heading is named "hours"

            2)

            Change "Display Name" values to use the following pattern:

            "[condition] [cultivar] [plant part] [hours] h"

            where:

            • [condition] is one of "25C" or "37C" (note: not using "cool" or "warm" here)
            • [cultivar] is one of Heinz, Malintka, Nagcarlang, Tamaulipas
            • [plant part] is one of "selfed ovary", "unpollinated ovary"
            • [num hours] is one of 0, 3, 8

            examples:

            row 1: 25C Heinz unpollinated ovary 0 h
            row 48: 25C Nagcarlang self-pollinated ovary 8 h

            3) Change values in "study name" to read "Ovary RP lab"

            4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary"

            5) Freeze the first row to allow user to scroll up and down without first row moving out of view

            6) Change every value in Physical Folder column to "pistil"

            7) Add a new column: "tissue type"; insert the column after "replicate"; fill in with values

            row 1-3: unpollinated ovary
            rows 4-end : self-fertilized ovary

            Show
            ann.loraine Ann Loraine added a comment - Change requests: 1) Remove "h" from values in "hours" column (no need to specify the units because the column heading is named "hours" 2) Change "Display Name" values to use the following pattern: " [condition] [cultivar] [plant part] [hours] h" where: [condition] is one of "25C" or "37C" (note: not using "cool" or "warm" here) [cultivar] is one of Heinz, Malintka, Nagcarlang, Tamaulipas [plant part] is one of "selfed ovary", "unpollinated ovary" [num hours] is one of 0, 3, 8 examples: row 1: 25C Heinz unpollinated ovary 0 h row 48: 25C Nagcarlang self-pollinated ovary 8 h 3) Change values in "study name" to read "Ovary RP lab" 4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" 5) Freeze the first row to allow user to scroll up and down without first row moving out of view 6) Change every value in Physical Folder column to "pistil" 7) Add a new column: "tissue type"; insert the column after "replicate"; fill in with values row 1-3: unpollinated ovary rows 4-end : self-fertilized ovary
            Hide
            ann.loraine Ann Loraine added a comment -

            Change request:

            4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" (from previous comment)

            New change request:

            1) Delete "condition" column

            New task management request:

            1) Delete from this Jira ticket obsolete copies of spreadsheet

            Show
            ann.loraine Ann Loraine added a comment - Change request: 4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" (from previous comment) New change request: 1) Delete "condition" column New task management request: 1) Delete from this Jira ticket obsolete copies of spreadsheet
            Hide
            Mdavis4290 Molly Davis added a comment -

            Final Draft:

            [^KP_sample_sheet.xlsx]

            Show
            Mdavis4290 Molly Davis added a comment - Final Draft: [^KP_sample_sheet.xlsx]
            Hide
            ann.loraine Ann Loraine added a comment -

            Make a branch for the spreadsheet addition to the repository and submit a PR.

            Add it to https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/ExternalData/.

            Show
            ann.loraine Ann Loraine added a comment - Make a branch for the spreadsheet addition to the repository and submit a PR. Add it to https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/ExternalData/ .
            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326 Pull request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/3
            Hide
            ann.loraine Ann Loraine added a comment -

            PR merged ready for testing.

            Show
            ann.loraine Ann Loraine added a comment - PR merged ready for testing.
            Hide
            ann.loraine Ann Loraine added a comment -

            Moving to Done as full testing needs to be done following edits of "make annots.xml" script.

            Show
            ann.loraine Ann Loraine added a comment - Moving to Done as full testing needs to be done following edits of "make annots.xml" script.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Pending discussion with Kelsey, we need to make some change to the sample sheet, as follows:

            1) Change study name to "Stigma Style"

            2) Change Display Name as follows:

            • first three rows - no change
            • rows four and more - change "self-pollinated ovary" to "stigma + style, self-pollinated"

            3) Sample Code : change it to

            [variety code].[temperature].[duration].[sample type].[replicate]

            Unpollinated ovary: O ex: H.25.0.O.1
            Stigma Style selfed: S ex: H.25.0.S.2

            4) Tissue type: change to:

            • 1st 3 rows stay the same
            • Other rows: "stigma and style from self-pollinated flowers"

            5) Fix order of spreadsheet so that similar samples group together IGB as discussed. See GM time course data for how to order samples.

            • Need to commit and merge change to the spreadsheet
            • Once merged, we need to re-run makeAnnotsXml.py to re-create the quickload files and test them in IGB
            • Once you're happy with this, you need to submit a PR to merge the new quickload files and deploy for testing
            Show
            ann.loraine Ann Loraine added a comment - - edited Pending discussion with Kelsey, we need to make some change to the sample sheet, as follows: 1) Change study name to "Stigma Style" 2) Change Display Name as follows: first three rows - no change rows four and more - change "self-pollinated ovary" to "stigma + style, self-pollinated" 3) Sample Code : change it to [variety code] . [temperature] . [duration] . [sample type] . [replicate] Unpollinated ovary: O ex: H.25.0.O.1 Stigma Style selfed: S ex: H.25.0.S.2 4) Tissue type: change to: 1st 3 rows stay the same Other rows: "stigma and style from self-pollinated flowers" 5) Fix order of spreadsheet so that similar samples group together IGB as discussed. See GM time course data for how to order samples. Need to commit and merge change to the spreadsheet Once merged, we need to re-run makeAnnotsXml.py to re-create the quickload files and test them in IGB Once you're happy with this, you need to submit a PR to merge the new quickload files and deploy for testing
            Hide
            robofjoy Robert Reid added a comment -

            After a few of the email exchanges with Ravi / Kelsey, it seems apparent that we need to pull all of the data together prior to next steps.

            On the HPC cluster, the data is in 3 separate locations due to the data coming in at different points of the year.
            Kelsey has clarified the names and samples in an Excel sheet:

            Google space

            Attaching the same excel sheet here because I am not sure where the Google sheet lives permanently.
            Sequenced Samples.xlsx

            Show
            robofjoy Robert Reid added a comment - After a few of the email exchanges with Ravi / Kelsey, it seems apparent that we need to pull all of the data together prior to next steps. On the HPC cluster, the data is in 3 separate locations due to the data coming in at different points of the year. Kelsey has clarified the names and samples in an Excel sheet: Google space Attaching the same excel sheet here because I am not sure where the Google sheet lives permanently. Sequenced Samples.xlsx
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Updates:

            • I believe what their actual request is adding their other samples from previous years to IGB quick load as well not just 2023 data. The other datasets are referenced in the Sequenced_Samples.xlsx file that Dr. Reid is referencing.
            • I will need to check and possibly prep these other datasets to be visualized in IGB. Each dataset could be made into different tickets to add to quick load site if not already there.
            • Cluster Directories:
            1. 2021: /projects/tomato_genome/rnaseq/ravi-tamaulipas
              Nf-core run: I do not see a nextflow run with this dataset.
            2. 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq
              Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files.
            3. 2023: /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq
              Nf-core run: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon # does include scaled and junction files.

            Question:

            • Do we want to add those previous datasets/experiments to the same quickload sample sheet?
            Show
            Mdavis4290 Molly Davis added a comment - - edited Updates : The current sample sheet names are fine for all of the 2023 self-pollinated and Ovary sample names. I will make detailed changes from Ann's previous comment to KP_sample_sheet.xlsx for now. Commit : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326b Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/5 I believe what their actual request is adding their other samples from previous years to IGB quick load as well not just 2023 data. The other datasets are referenced in the Sequenced_Samples.xlsx file that Dr. Reid is referencing. I will need to check and possibly prep these other datasets to be visualized in IGB. Each dataset could be made into different tickets to add to quick load site if not already there. Cluster Directories: 2021: /projects/tomato_genome/rnaseq/ravi-tamaulipas Nf-core run: I do not see a nextflow run with this dataset. 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files. 2023: /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq Nf-core run: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon # does include scaled and junction files. Question: Do we want to add those previous datasets/experiments to the same quickload sample sheet?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            I think it would be best if we listed all of the Palanavelu Lab-generated samples in the same spreadsheet. In the "study-name" column, we will use different names to indicate which "batch" of sequence files the sample file came from. Based on the comment by [~molly] above, there were 4 "batches" of sequence files delivered.

            Possible study names for the 2020 and 2021 samples:

            • for the 2021 /projects/tomato_genome/rnaseq/ravi-tamaulipas data, study could be: "Tamaulipas [tissue type] RP Lab"
            • 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq could be: "Hz Mal Nag Tam heat-treated unpollinated pistils RP Lab"
            Show
            ann.loraine Ann Loraine added a comment - - edited I think it would be best if we listed all of the Palanavelu Lab-generated samples in the same spreadsheet. In the "study-name" column, we will use different names to indicate which "batch" of sequence files the sample file came from. Based on the comment by [~molly] above, there were 4 "batches" of sequence files delivered. Possible study names for the 2020 and 2021 samples: for the 2021 /projects/tomato_genome/rnaseq/ravi-tamaulipas data, study could be: "Tamaulipas [tissue type] RP Lab" 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq could be: "Hz Mal Nag Tam heat-treated unpollinated pistils RP Lab"
            Hide
            ann.loraine Ann Loraine added a comment -

            I merged the PR.

            To test:

            • Run "makeAnnotsXml.py" from the genome browser visualization repository bitbucket hotpollen repository to rebuild the Quickload files
            • Add the local clone of the genome browser visualization repository's "quickload" folder as a new data source in IGB
            • Open the tomato SL5 genome in IGB
            • Look at the Data Access file and folder tree in the Data Access Panel of IGB
            • Check that there is a "pistil" folder in the file and folder tree in IGB (see previous comments for notes on what the folders and files should be named)
            Show
            ann.loraine Ann Loraine added a comment - I merged the PR. To test: Run "makeAnnotsXml.py" from the genome browser visualization repository bitbucket hotpollen repository to rebuild the Quickload files Add the local clone of the genome browser visualization repository's "quickload" folder as a new data source in IGB Open the tomato SL5 genome in IGB Look at the Data Access file and folder tree in the Data Access Panel of IGB Check that there is a "pistil" folder in the file and folder tree in IGB (see previous comments for notes on what the folders and files should be named)
            Hide
            ann.loraine Ann Loraine added a comment -

            Attn [~molly] : I'm not sure where this ticket should go next. Let's discuss.

            Show
            ann.loraine Ann Loraine added a comment - Attn [~molly] : I'm not sure where this ticket should go next. Let's discuss.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Testing:

            • Terminal: Make sure that main repos are cloned for sample_sheets or added to genome-browser-visualization folder.
              cd molly-genome-browser-visualization/
              export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload
              ./makeAnnotsXml.py
              git status
              
            • Opened and viewed in IGB:
              URL add data source: file: /Users/mollydavis333/Desktop/molly-genome-browser-visualization/quickload
              Name: Stigma Style
              Links work and I can load all of the data

            Ready to commit updated "annots.xml" file in genome-browser-visualization:

            Branch: https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3326
            Pull Request: https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/4

            Show
            Mdavis4290 Molly Davis added a comment - Testing : Terminal : Make sure that main repos are cloned for sample_sheets or added to genome-browser-visualization folder. cd molly-genome-browser-visualization/ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload ./makeAnnotsXml.py git status Opened and viewed in IGB : URL add data source: file: /Users/mollydavis333/Desktop/molly-genome-browser-visualization/quickload Name: Stigma Style Links work and I can load all of the data Ready to commit updated "annots.xml" file in genome-browser-visualization : Branch : https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3326 Pull Request : https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/4
            Hide
            ann.loraine Ann Loraine added a comment -

            The newest PR target the genome browser visualization repository is now merged and ready for testing.

            Show
            ann.loraine Ann Loraine added a comment - The newest PR target the genome browser visualization repository is now merged and ready for testing.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pulled recent merge to my fork and tested files on my local machine. I configured the local URL in IGB and visualized the the first file of each folder in Stigma Style on my new server. Everything loaded properly and was visually correct.

            Would you like to test updated files in IGB as well? [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Pulled recent merge to my fork and tested files on my local machine. I configured the local URL in IGB and visualized the the first file of each folder in Stigma Style on my new server. Everything loaded properly and was visually correct. Would you like to test updated files in IGB as well? [~aloraine]

              People

              • Assignee:
                Unassigned
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: