[IGBF-3326] Create KP 2023 sample sheet needed for Quickload site and more - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
3.5
Epic Link:
Process and deploy Palanivelu Lab data
Sprint:
Spring 8 2023 Apr 24, Spring 9 2023 May 1, Summer 2 2023 May 29, Summer 3 2023 June 12

Description

Original Directory: /projects/tomato_genome/rnaseq/30-804059537-kelsie
Nf-core run: /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

Sequenced Samples.xlsx
53 kB
06/Jun/23 9:49 AM

Issue Links

relates to

IGBF-3251 Process and deploy Palanivelu Lab data

To-Do

IGBF-3324 Generate junction files Kelsey's Palanivelu Lab 2023

Closed

IGBF-3325 Copy KP 2023 data (30-804059537) to IGB Quickload host for visualization in IGB

Closed

IGBF-3328 Add KP 2023 multiqc report and sample CSV file to bitbucket

Closed

IGBF-3338 Add and test KP 2023 sample sheet data to "hotpollen" annots.xml

Closed

IGBF-3323 Generate scaled coverage graphs Kelsey's Palanivelu Lab 2023

Closed

IGBF-3329 Create and test SRP371294 sample sheet spreadsheet for adding data to "hotpollen" annots.xml

To-Do

Show 2 more links (2 relates to)

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ann Loraine added a comment - 28/Apr/23 2:27 PM

To see a listing of all the "bam" files and their file name prefixes, see:

http://lorainelab-quickload.scidas.org/hotpollen/S_lycopersicum_Jun_2022/pistil/

Show

Ann Loraine added a comment - 28/Apr/23 2:27 PM To see a listing of all the "bam" files and their file name prefixes, see: http://lorainelab-quickload.scidas.org/hotpollen/S_lycopersicum_Jun_2022/pistil/

Hide

Permalink

Ann Loraine added a comment - 01/May/23 10:09 AM - edited

Existing Excel sample sheets:

Also,

please review prior tickets where we made samples sheets for other datasets.
create a more complete list of sample sheets from other data sets and repositories

Show

Ann Loraine added a comment - 01/May/23 10:09 AM - edited Existing Excel sample sheets: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/ExternalDataSets/muday-144_sample_sheet.xlsx https://bitbucket.org/hotpollen/splicing-analysis/src/main/ExternalData/SRP252265_SRP328042_sample_sheet.xlsx Also, please review prior tickets where we made samples sheets for other datasets. create a more complete list of sample sheets from other data sets and repositories

Hide

Permalink

Ann Loraine added a comment - 01/May/23 10:12 AM

Also, there's a script that reads these files and does a thing - it makes "annots.xml" metadata file for deploying the data in IGB for visualization.

This script is:

https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/makeAnnotsXml.py

Please look / understand the repository genome-browser-visualization.

Show

Ann Loraine added a comment - 01/May/23 10:12 AM Also, there's a script that reads these files and does a thing - it makes "annots.xml" metadata file for deploying the data in IGB for visualization. This script is: https://bitbucket.org/hotpollen/genome-browser-visualization/src/main/makeAnnotsXml.py Please look / understand the repository genome-browser-visualization.

Hide

Permalink

Molly Davis added a comment - 02/May/23 3:59 PM

First draft sample sheet:
[^Draft1_KP_sample_sheet.xlsx]

Show

Molly Davis added a comment - 02/May/23 3:59 PM First draft sample sheet: [^Draft1_KP_sample_sheet.xlsx]

Hide

Permalink

Ann Loraine added a comment - 05/May/23 9:39 AM

Change requests:

Remove "h" from values in "hours" column (no need to specify the units because the column heading is named "hours"

Change "Display Name" values to use the following pattern:

"[condition] [cultivar] [plant part] [hours] h"

where:

[condition] is one of "25C" or "37C" (note: not using "cool" or "warm" here)
[cultivar] is one of Heinz, Malintka, Nagcarlang, Tamaulipas
[plant part] is one of "selfed ovary", "unpollinated ovary"
[num hours] is one of 0, 3, 8

examples:

row 1: 25C Heinz unpollinated ovary 0 h
row 48: 25C Nagcarlang self-pollinated ovary 8 h

3) Change values in "study name" to read "Ovary RP lab"

4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary"

5) Freeze the first row to allow user to scroll up and down without first row moving out of view

6) Change every value in Physical Folder column to "pistil"

7) Add a new column: "tissue type"; insert the column after "replicate"; fill in with values

row 1-3: unpollinated ovary
rows 4-end : self-fertilized ovary

Show

Ann Loraine added a comment - 05/May/23 9:39 AM Change requests: 1) Remove "h" from values in "hours" column (no need to specify the units because the column heading is named "hours" 2) Change "Display Name" values to use the following pattern: " [condition] [cultivar] [plant part] [hours] h" where: [condition] is one of "25C" or "37C" (note: not using "cool" or "warm" here) [cultivar] is one of Heinz, Malintka, Nagcarlang, Tamaulipas [plant part] is one of "selfed ovary", "unpollinated ovary" [num hours] is one of 0, 3, 8 examples: row 1: 25C Heinz unpollinated ovary 0 h row 48: 25C Nagcarlang self-pollinated ovary 8 h 3) Change values in "study name" to read "Ovary RP lab" 4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" 5) Freeze the first row to allow user to scroll up and down without first row moving out of view 6) Change every value in Physical Folder column to "pistil" 7) Add a new column: "tissue type"; insert the column after "replicate"; fill in with values row 1-3: unpollinated ovary rows 4-end : self-fertilized ovary

Hide

Permalink

Ann Loraine added a comment - 09/May/23 7:50 AM

Change request:

4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" (from previous comment)

New change request:

1) Delete "condition" column

New task management request:

1) Delete from this Jira ticket obsolete copies of spreadsheet

Show

Ann Loraine added a comment - 09/May/23 7:50 AM Change request: 4) Change values in first three rows in column "species" to indicate cultivar only, e.g., "Heinz" not "Heinz-Ovary" (from previous comment) New change request: 1) Delete "condition" column New task management request: 1) Delete from this Jira ticket obsolete copies of spreadsheet

Hide

Permalink

Molly Davis added a comment - 09/May/23 9:36 AM

Final Draft:

[^KP_sample_sheet.xlsx]

Show

Molly Davis added a comment - 09/May/23 9:36 AM Final Draft: [^KP_sample_sheet.xlsx]

Hide

Permalink

Ann Loraine added a comment - 09/May/23 9:58 AM

Make a branch for the spreadsheet addition to the repository and submit a PR.

Add it to https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/ExternalData/.

Show

Ann Loraine added a comment - 09/May/23 9:58 AM Make a branch for the spreadsheet addition to the repository and submit a PR. Add it to https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/ExternalData/ .

Hide

Permalink

Molly Davis added a comment - 09/May/23 10:29 AM

Branch: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326
Pull request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/3

Show

Molly Davis added a comment - 09/May/23 10:29 AM Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326 Pull request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/3

Hide

Permalink

Ann Loraine added a comment - 09/May/23 1:46 PM

PR merged ready for testing.

Show

Ann Loraine added a comment - 09/May/23 1:46 PM PR merged ready for testing.

Hide

Permalink

Ann Loraine added a comment - 11/May/23 8:34 AM

Moving to Done as full testing needs to be done following edits of "make annots.xml" script.

Show

Ann Loraine added a comment - 11/May/23 8:34 AM Moving to Done as full testing needs to be done following edits of "make annots.xml" script.

Hide

Permalink

Ann Loraine added a comment - 25/May/23 10:59 AM - edited

Pending discussion with Kelsey, we need to make some change to the sample sheet, as follows:

1) Change study name to "Stigma Style"

2) Change Display Name as follows:

first three rows - no change
rows four and more - change "self-pollinated ovary" to "stigma + style, self-pollinated"

3) Sample Code : change it to

[variety code].[temperature].[duration].[sample type].[replicate]

Unpollinated ovary: O ex: H.25.0.O.1
Stigma Style selfed: S ex: H.25.0.S.2

4) Tissue type: change to:

1st 3 rows stay the same
Other rows: "stigma and style from self-pollinated flowers"

5) Fix order of spreadsheet so that similar samples group together IGB as discussed. See GM time course data for how to order samples.

Need to commit and merge change to the spreadsheet
Once merged, we need to re-run makeAnnotsXml.py to re-create the quickload files and test them in IGB
Once you're happy with this, you need to submit a PR to merge the new quickload files and deploy for testing

Show

Ann Loraine added a comment - 25/May/23 10:59 AM - edited Pending discussion with Kelsey, we need to make some change to the sample sheet, as follows: 1) Change study name to "Stigma Style" 2) Change Display Name as follows: first three rows - no change rows four and more - change "self-pollinated ovary" to "stigma + style, self-pollinated" 3) Sample Code : change it to [variety code] . [temperature] . [duration] . [sample type] . [replicate] Unpollinated ovary: O ex: H.25.0.O.1 Stigma Style selfed: S ex: H.25.0.S.2 4) Tissue type: change to: 1st 3 rows stay the same Other rows: "stigma and style from self-pollinated flowers" 5) Fix order of spreadsheet so that similar samples group together IGB as discussed. See GM time course data for how to order samples. Need to commit and merge change to the spreadsheet Once merged, we need to re-run makeAnnotsXml.py to re-create the quickload files and test them in IGB Once you're happy with this, you need to submit a PR to merge the new quickload files and deploy for testing

Hide

Permalink

Robert Reid added a comment - 06/Jun/23 9:50 AM

After a few of the email exchanges with Ravi / Kelsey, it seems apparent that we need to pull all of the data together prior to next steps.

On the HPC cluster, the data is in 3 separate locations due to the data coming in at different points of the year.
Kelsey has clarified the names and samples in an Excel sheet:

Google space

Attaching the same excel sheet here because I am not sure where the Google sheet lives permanently.
Sequenced Samples.xlsx

Show

Robert Reid added a comment - 06/Jun/23 9:50 AM After a few of the email exchanges with Ravi / Kelsey, it seems apparent that we need to pull all of the data together prior to next steps. On the HPC cluster, the data is in 3 separate locations due to the data coming in at different points of the year. Kelsey has clarified the names and samples in an Excel sheet: Google space Attaching the same excel sheet here because I am not sure where the Google sheet lives permanently. Sequenced Samples.xlsx

Hide

Permalink

Molly Davis added a comment - 14/Jun/23 3:27 PM - edited

Updates:

The current sample sheet names are fine for all of the 2023 self-pollinated and Ovary sample names. I will make detailed changes from Ann's previous comment to KP_sample_sheet.xlsx for now.
Commit: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326b
Pull Request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/5

I believe what their actual request is adding their other samples from previous years to IGB quick load as well not just 2023 data. The other datasets are referenced in the Sequenced_Samples.xlsx file that Dr. Reid is referencing.
I will need to check and possibly prep these other datasets to be visualized in IGB. Each dataset could be made into different tickets to add to quick load site if not already there.
Cluster Directories:

2021: /projects/tomato_genome/rnaseq/ravi-tamaulipas
Nf-core run: I do not see a nextflow run with this dataset.
2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq
Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files.
2023: /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq
Nf-core run: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon # does include scaled and junction files.

Question:

Do we want to add those previous datasets/experiments to the same quickload sample sheet?

Show

Molly Davis added a comment - 14/Jun/23 3:27 PM - edited Updates : The current sample sheet names are fine for all of the 2023 self-pollinated and Ovary sample names. I will make detailed changes from Ann's previous comment to KP_sample_sheet.xlsx for now. Commit : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3326b Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/5 I believe what their actual request is adding their other samples from previous years to IGB quick load as well not just 2023 data. The other datasets are referenced in the Sequenced_Samples.xlsx file that Dr. Reid is referencing. I will need to check and possibly prep these other datasets to be visualized in IGB. Each dataset could be made into different tickets to add to quick load site if not already there. Cluster Directories: 2021: /projects/tomato_genome/rnaseq/ravi-tamaulipas Nf-core run: I do not see a nextflow run with this dataset. 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq Nf-core run: /nobackup/tomato_genome/ravi-55/sl5-nfcore/results # but there are no scaled or junction files. 2023: /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq Nf-core run: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon # does include scaled and junction files. Question: Do we want to add those previous datasets/experiments to the same quickload sample sheet?

Hide

Permalink

Ann Loraine added a comment - 14/Jun/23 6:23 PM - edited

I think it would be best if we listed all of the Palanavelu Lab-generated samples in the same spreadsheet. In the "study-name" column, we will use different names to indicate which "batch" of sequence files the sample file came from. Based on the comment by [~molly] above, there were 4 "batches" of sequence files delivered.

Possible study names for the 2020 and 2021 samples:

for the 2021 /projects/tomato_genome/rnaseq/ravi-tamaulipas data, study could be: "Tamaulipas [tissue type] RP Lab"
2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq could be: "Hz Mal Nag Tam heat-treated unpollinated pistils RP Lab"

Show

Ann Loraine added a comment - 14/Jun/23 6:23 PM - edited I think it would be best if we listed all of the Palanavelu Lab-generated samples in the same spreadsheet. In the "study-name" column, we will use different names to indicate which "batch" of sequence files the sample file came from. Based on the comment by [~molly] above, there were 4 "batches" of sequence files delivered. Possible study names for the 2020 and 2021 samples: for the 2021 /projects/tomato_genome/rnaseq/ravi-tamaulipas data, study could be: "Tamaulipas [tissue type] RP Lab" 2022: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536/00_fastq could be: "Hz Mal Nag Tam heat-treated unpollinated pistils RP Lab"

Hide

Permalink

Ann Loraine added a comment - 14/Jun/23 6:40 PM

I merged the PR.

To test:

Run "makeAnnotsXml.py" from the genome browser visualization repository bitbucket hotpollen repository to rebuild the Quickload files
Add the local clone of the genome browser visualization repository's "quickload" folder as a new data source in IGB
Open the tomato SL5 genome in IGB
Look at the Data Access file and folder tree in the Data Access Panel of IGB
Check that there is a "pistil" folder in the file and folder tree in IGB (see previous comments for notes on what the folders and files should be named)

Show

Ann Loraine added a comment - 14/Jun/23 6:40 PM I merged the PR. To test: Run "makeAnnotsXml.py" from the genome browser visualization repository bitbucket hotpollen repository to rebuild the Quickload files Add the local clone of the genome browser visualization repository's "quickload" folder as a new data source in IGB Open the tomato SL5 genome in IGB Look at the Data Access file and folder tree in the Data Access Panel of IGB Check that there is a "pistil" folder in the file and folder tree in IGB (see previous comments for notes on what the folders and files should be named)

Hide

Permalink

Ann Loraine added a comment - 14/Jun/23 6:42 PM

Attn [~molly] : I'm not sure where this ticket should go next. Let's discuss.

Show

Ann Loraine added a comment - 14/Jun/23 6:42 PM Attn [~molly] : I'm not sure where this ticket should go next. Let's discuss.

Hide

Permalink

Molly Davis added a comment - 15/Jun/23 11:27 AM

Testing:

Terminal: Make sure that main repos are cloned for sample_sheets or added to genome-browser-visualization folder.

cd molly-genome-browser-visualization/
export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload
./makeAnnotsXml.py
git status

Opened and viewed in IGB:
URL add data source: file: /Users/mollydavis333/Desktop/molly-genome-browser-visualization/quickload
Name: Stigma Style
Links work and I can load all of the data

Ready to commit updated "annots.xml" file in genome-browser-visualization:

Branch: https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3326
Pull Request: https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/4

Show

Molly Davis added a comment - 15/Jun/23 11:27 AM Testing : Terminal : Make sure that main repos are cloned for sample_sheets or added to genome-browser-visualization folder. cd molly-genome-browser-visualization/ export PYTHONPATH=/Users/mollydavis333/Desktop/igbquickload ./makeAnnotsXml.py git status Opened and viewed in IGB : URL add data source: file: /Users/mollydavis333/Desktop/molly-genome-browser-visualization/quickload Name: Stigma Style Links work and I can load all of the data Ready to commit updated "annots.xml" file in genome-browser-visualization : Branch : https://bitbucket.org/mdavis4290/molly-genome-browser-visualization/branch/IGBF-3326 Pull Request : https://bitbucket.org/hotpollen/genome-browser-visualization/pull-requests/4

Hide

Permalink

Ann Loraine added a comment - 15/Jun/23 12:12 PM

The newest PR target the genome browser visualization repository is now merged and ready for testing.

Show

Ann Loraine added a comment - 15/Jun/23 12:12 PM The newest PR target the genome browser visualization repository is now merged and ready for testing.

Hide

Permalink

Molly Davis added a comment - 16/Jun/23 9:17 AM - edited

Pulled recent merge to my fork and tested files on my local machine. I configured the local URL in IGB and visualized the the first file of each folder in Stigma Style on my new server. Everything loaded properly and was visually correct.

Would you like to test updated files in IGB as well? [~aloraine]

Show

Molly Davis added a comment - 16/Jun/23 9:17 AM - edited Pulled recent merge to my fork and tested files on my local machine. I configured the local URL in IGB and visualized the the first file of each folder in Stigma Style on my new server. Everything loaded properly and was visually correct. Would you like to test updated files in IGB as well? [~aloraine]

People

Assignee:

Unassigned

Reporter:

Molly Davis

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

24/Apr/23 10:58 AM

Updated:

27/Jul/23 4:33 PM

Resolved:

16/Jun/23 10:33 AM