Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3251

Process and deploy Palanivelu Lab data

    Details

    • Type: Epic
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Epic Name:
      Process and deploy Palanivelu Lab data
    • Story Points:
      4
    • Sprint:
      Spring 3 2023 Feb 1, Spring 8 2023 Apr 24, Spring 9 2023 May 1

      Description

      For this task,

      About the data:

      As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

      For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

      Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

      The three data collections are:

      1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

      These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type represented in the sequencing data, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The biological material were created in 2021, in the summer and early spring.

      The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the biological material were created separately from the un-pollinated pistils.

      Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

      We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

      KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the git repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

      When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

      2) self-pollinated stigma+style heat stress experiment, Azenta id 30-804059537

      We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

      The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

      This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

      • temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
      • treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
      • four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
      • tissue type: dissected stigma and style tissue from self-pollinated flowers

      There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

      Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

      3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043

      This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas plants.

      The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
      Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

      Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we label the data in IGB.

      To-do for each experimental data set:

      • Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblies using "reverse" strandedness parameter.
      • Check the multi-qc report. Re-run the processing as necessary.
      • Rename BAM files to not included "sorted" in the name.
      • Create scaled coverage graphs.
      • Create junction files.
      • Migrate data to an on-line location for IGB visualization.
      • Create annots.xml metadata file with visualization parameters for each dataset; add the data collection to the makeAnnotsXml.py script
      • Add the "counts" data files to the repository for statistical analysis
      • Add documentation for each sequence collection to the git repository
      • Perform data checking to catch any record-keeping errors that may have occurred

      Attached:

      • Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
      • Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

      Contact:

      • Kelsey Pryze - kelseypryze@email.arizona.edu

        Attachments

          Issue Links

            Activity

            ann.loraine Ann Loraine created issue -
            ann.loraine Ann Loraine made changes -
            Field Original Value New Value
            Epic Link IGBF-2993 [ 21429 ]
            ann.loraine Ann Loraine made changes -
            Description RR to describe location, etc. For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on just one sample to get a preliminary MultiQC report. Check the strandedness parameter.
            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples using the correct strandedness parameter.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            ann.loraine Ann Loraine made changes -
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on just one sample to get a preliminary MultiQC report. Check the strandedness parameter.
            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples using the correct strandedness parameter.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on just one sample to get a preliminary MultiQC report. Check the strandedness parameter.
            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples using the correct strandedness parameter.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on just one sample to get a preliminary MultiQC report. Check the strandedness parameter.
            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples using the correct strandedness parameter.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            ann.loraine Ann Loraine made changes -
            Attachment 30-804059537.pdf [ 17673 ]
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Report from the sequencer (Azenta) is attached.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization
            For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab.

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/ovary-rnaseq/src/main/

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/ovary-rnaseq/src/main/

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly on all the samples "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/ovary-rnaseq/src/main/

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/ovary-rnaseq/src/main/

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-kelsie

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            Mdavis4290 Molly Davis made changes -
            Attachment KP_samples.csv [ 17674 ]
            Mdavis4290 Molly Davis made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Mdavis4290 Molly Davis made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 9.40.46 AM.png [ 17678 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pipeline successfully ran:
            Unable to render embedded object: File (Screen Shot 2023-02-09 at 9.40.46 AM.png) not found.

            Directory: /nobackup/tomato_genome/30-804059537-KP

            Comment: There are no errors in the report but the number of sequences mapped is pretty low. Might need to look into that! Double check sample sheet I made maybe or the wrong reference genome was used to map data.

            • Sequence Duplication levels might be an issue
              Unable to render embedded object: File (Screen Shot 2023-02-09 at 10.28.15 AM.png) not found.
            • Per sequence GC Content also poor
            • Alignment scores are poor because unmapped reads are too short.

              Link to interpret report: https://nf-co.re/eager/2.2.2/output#multiqc-report

            Next steps:

            • remove sorted names
            • make coverage graphs
            Show
            Mdavis4290 Molly Davis added a comment - - edited Pipeline successfully ran: Unable to render embedded object: File (Screen Shot 2023-02-09 at 9.40.46 AM.png) not found. Directory: /nobackup/tomato_genome/30-804059537-KP Comment: There are no errors in the report but the number of sequences mapped is pretty low. Might need to look into that! Double check sample sheet I made maybe or the wrong reference genome was used to map data. Sequence Duplication levels might be an issue Unable to render embedded object: File (Screen Shot 2023-02-09 at 10.28.15 AM.png) not found. Per sequence GC Content also poor Alignment scores are poor because unmapped reads are too short. Link to interpret report: https://nf-co.re/eager/2.2.2/output#multiqc-report Next steps: remove sorted names make coverage graphs
            Mdavis4290 Molly Davis made changes -
            Attachment KP_multiqc_report.html [ 17679 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            The "star alignment scores" section looks informative. According to the plot, a high percentage of sequences in some samples were reported as "Unmapped: too short." I think this can happen when the library contained a lot of very short inserts. I think we should go ahead and proceed with the rest of the pipeline, but keep an eye on how those samples with the largest percentage of "shorties" perform in subsequent analyses.

            attn: [~molly]

            Show
            ann.loraine Ann Loraine added a comment - - edited The "star alignment scores" section looks informative. According to the plot, a high percentage of sequences in some samples were reported as "Unmapped: too short." I think this can happen when the library contained a lot of very short inserts. I think we should go ahead and proceed with the rest of the pipeline, but keep an eye on how those samples with the largest percentage of "shorties" perform in subsequent analyses. attn: [~molly]
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 10.28.15 AM.png [ 17680 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 10.28.15 AM.png [ 17681 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 10.28.15 AM.png [ 17681 ]
            Mdavis4290 Molly Davis made changes -
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Other people were having the same issue and said:

            "I encountered this issue when in the two paired-end input FASTQ files mates are out-of-order, i.e. mates are not found at the same line of the two files. This leads to a lot of not properly mapped read pairs that STAR throws into the "too short" bucket." https://github.com/alexdobin/STAR/issues/169

            Not sure if this is the same but I will look more into it and check the fastq file directory. The naming of the files might be confusing nextflow as well because R1 and R2 are used twice in some file names.

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Other people were having the same issue and said: "I encountered this issue when in the two paired-end input FASTQ files mates are out-of-order, i.e. mates are not found at the same line of the two files. This leads to a lot of not properly mapped read pairs that STAR throws into the "too short" bucket." https://github.com/alexdobin/STAR/issues/169 Not sure if this is the same but I will look more into it and check the fastq file directory. The naming of the files might be confusing nextflow as well because R1 and R2 are used twice in some file names. [~aloraine]
            Show
            ann.loraine Ann Loraine added a comment - Renaming code: https://bitbucket.org/hotpollen/splicing-analysis/src/main/src/renameBams.sh
            Mdavis4290 Molly Davis made changes -
            Attachment KP_samples.csv [ 17686 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Update:

            • I have changed the names of the fastq files so instead of having R1, R2, R3 it is now Rep1, Rep2, Rep3. This is due to nextflow possibly confusing paired end files and not being matched together correctly.
            • Here is the new csv samples file: [^KP_samples.csv]
            • Rerunning nextflow.
            Show
            Mdavis4290 Molly Davis added a comment - - edited Update: I have changed the names of the fastq files so instead of having R1, R2, R3 it is now Rep1, Rep2, Rep3. This is due to nextflow possibly confusing paired end files and not being matched together correctly. Here is the new csv samples file: [^KP_samples.csv] Rerunning nextflow.
            Mdavis4290 Molly Davis made changes -
            Attachment KP_multiqc_report.html [ 17687 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Update:

            New MutliQC report:
            [^KP_multiqc_report.html]

            Comment: Unfortunately, even after changing the fastq file names the mutliqc report is the exact same as the last one with Unmapped: too short # of reads for the alignment scores.

            Here is the txt file to see the alignment scores instead of using the actual graph from the report:
            multiqc_star.txt

            Directory: /nobackup/tomato_genome/30-804059537-KP/results/multiqc/star_salmon/multiqc_data

            Show
            Mdavis4290 Molly Davis added a comment - - edited Update: New MutliQC report: [^KP_multiqc_report.html] Comment: Unfortunately, even after changing the fastq file names the mutliqc report is the exact same as the last one with Unmapped: too short # of reads for the alignment scores. Here is the txt file to see the alignment scores instead of using the actual graph from the report: multiqc_star.txt Directory: /nobackup/tomato_genome/30-804059537-KP/results/multiqc/star_salmon/multiqc_data
            Hide
            ann.loraine Ann Loraine added a comment -

            Next steps:

            • Proceed with using the output from most recent run of nextflow nf-core/rnaseq pipeline which used the renamed samples (e.g., Rep1, Rep2, Rep3)
            • [~aloraine] to review samples file for possible problems
            Show
            ann.loraine Ann Loraine added a comment - Next steps: Proceed with using the output from most recent run of nextflow nf-core/rnaseq pipeline which used the renamed samples (e.g., Rep1, Rep2, Rep3) [~aloraine] to review samples file for possible problems
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 10.28.15 AM.png [ 17680 ]
            Mdavis4290 Molly Davis made changes -
            Attachment multiqc_star.txt [ 17688 ]
            Hide
            Mdavis4290 Molly Davis added a comment -

            Update:

            I moved forward with results and renamed sorted bam files and made coverage graphs.

            Directory: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon

            Show
            Mdavis4290 Molly Davis added a comment - Update: I moved forward with results and renamed sorted bam files and made coverage graphs. Directory: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon
            Mdavis4290 Molly Davis made changes -
            Attachment KP_samples.csv [ 17674 ]
            Mdavis4290 Molly Davis made changes -
            Attachment KP_multiqc_report.html [ 17679 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Update: After speaking with Nowlan, we agree that running fastqc on the files would be beneficial to check the quality of the data before running it with nextflow.

            Script:

            #!/bin/bash
            
            
            #SBATCH --job-name=fastqc               #job name after submission
            #SBATCH -p Orion                        #partition being used
            #SBATCH -N 1                            #number of nodes to use
            #SBATCH --ntasks-per-node=8             #max number of tasks per node
            #SBATCH --mem=60gb                      #memory required per node
            #SBATCH -t 0-50:00                      #time (D-HH:MM)
            #SBATCH -o fastqc.%j.out                #standard output file
            #SBATCH -e fastqc.%j.err                #standard error file
            #SBATCH --mail-type=END,FAIL            #Notifications for job complete/failure
            #SBATCH --mail-user=mdavi258@uncc.edu   #Send to user email
            
            
            module load fastqc
            
            for i in /nobackup/tomato_genome/30-804059537-KP/*.fastq.gz
            do
              	fastqc -o /nobackup/tomato_genome/30-804059537-KP/fastQC_dataQuality $i
            done
            

            Directory: /nobackup/tomato_genome/30-804059537-KP/fastQC_dataQuality

            For quick reference here are two of the fastqc reports to check data quality:
            [^Heinz-Ovary-Rep1-0hr-25C-unpol_R1_001_fastqc.html]
            [^Heinz-Ovary-Rep2-0hr-25C-unpol_R1_001_fastqc.html]

            Let me know what you think Nowlan Freese

            Show
            Mdavis4290 Molly Davis added a comment - - edited Update: After speaking with Nowlan, we agree that running fastqc on the files would be beneficial to check the quality of the data before running it with nextflow. Script: #!/bin/bash #SBATCH --job-name=fastqc #job name after submission #SBATCH -p Orion #partition being used #SBATCH -N 1 #number of nodes to use #SBATCH --ntasks-per-node=8 #max number of tasks per node #SBATCH --mem=60gb #memory required per node #SBATCH -t 0-50:00 #time (D-HH:MM) #SBATCH -o fastqc.%j.out #standard output file #SBATCH -e fastqc.%j.err #standard error file #SBATCH --mail-type=END,FAIL #Notifications for job complete/failure #SBATCH --mail-user=mdavi258@uncc.edu #Send to user email module load fastqc for i in /nobackup/tomato_genome/30-804059537-KP/*.fastq.gz do fastqc -o /nobackup/tomato_genome/30-804059537-KP/fastQC_dataQuality $i done Directory: /nobackup/tomato_genome/30-804059537-KP/fastQC_dataQuality For quick reference here are two of the fastqc reports to check data quality: [^Heinz-Ovary-Rep1-0hr-25C-unpol_R1_001_fastqc.html] [^Heinz-Ovary-Rep2-0hr-25C-unpol_R1_001_fastqc.html] Let me know what you think Nowlan Freese
            Mdavis4290 Molly Davis made changes -
            Attachment Heinz-Ovary-Rep1-0hr-25C-unpol_R1_001_fastqc.html [ 17689 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Heinz-Ovary-Rep2-0hr-25C-unpol_R1_001_fastqc.html [ 17690 ]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Notes about High Duplication:

            • High duplication is either going to be the result of technical duplication (too many PCR cycles), or over-sequencing (very high fold coverage).
            • How many PCR cycles were done during the protocol? Also, how many reads were their total?
            • Dimer Contamination?
            • Should we just let it through for downstream analysis?

            Source: https://wiki.bits.vib.be/index.php/Quality_control_of_NGS_data

            Show
            Mdavis4290 Molly Davis added a comment - - edited Notes about High Duplication: High duplication is either going to be the result of technical duplication (too many PCR cycles), or over-sequencing (very high fold coverage). How many PCR cycles were done during the protocol? Also, how many reads were their total? Dimer Contamination? Should we just let it through for downstream analysis? Source: https://wiki.bits.vib.be/index.php/Quality_control_of_NGS_data
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Validating mate pair files

            Script:

            #!/bin/bash
            
            
            #SBATCH --job-name=validate_script      #job name after submission
            #SBATCH -p Orion                        #partition being used
            #SBATCH -N 1                            #number of nodes to use
            #SBATCH --ntasks-per-node=8             #max number of tasks per node
            #SBATCH --mem=60gb                      #memory required per node
            #SBATCH -t 0-50:00                      #time (D-HH:MM)
            #SBATCH -o validate.%j.out              #standard output file
            #SBATCH -e validate.%j.err              #standard error file
            #SBATCH --mail-type=END,FAIL            #Notifications for job complete/failure
            #SBATCH --mail-user=mdavi258@uncc.edu   #Send to user email
            #SBATCH --array=1-63
            
            
            #setting up where to grab files from
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /nobackup/tomato_genome/30-804059537-KP/kp_runlist.txt)
            
            #The command to validate each pair:
            cd /nobackup/tomato_genome/30-804059537-KP
            
            perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_R1_001.fastq.gz ${file}_R2_001.fastq.gz
            
            
            echo "Done"
            

            Error: mate-pair files are not ordered

            Ann's note: We need to understand what exactly this script actually is doing and assessing.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Validating mate pair files Script: #!/bin/bash #SBATCH --job-name=validate_script #job name after submission #SBATCH -p Orion #partition being used #SBATCH -N 1 #number of nodes to use #SBATCH --ntasks-per-node=8 #max number of tasks per node #SBATCH --mem=60gb #memory required per node #SBATCH -t 0-50:00 #time (D-HH:MM) #SBATCH -o validate.%j.out #standard output file #SBATCH -e validate.%j.err #standard error file #SBATCH --mail-type=END,FAIL #Notifications for job complete/failure #SBATCH --mail-user=mdavi258@uncc.edu #Send to user email #SBATCH --array=1-63 #setting up where to grab files from file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/30-804059537-KP/kp_runlist.txt) #The command to validate each pair: cd /nobackup/tomato_genome/30-804059537-KP perl /projects/tomato_genome/scripts/validateHiseqPairs.pl ${file}_R1_001.fastq.gz ${file}_R2_001.fastq.gz echo "Done" Error: mate-pair files are not ordered Ann's note: We need to understand what exactly this script actually is doing and assessing.
            Hide
            ann.loraine Ann Loraine added a comment -

            [~RobertReid] : MD5 checking shows that the fastq files were not corrupted by the transfer.
            Nowlan Freese : Suggests comparing the first few lines per file (via "head" function) to see if pairs are present

            Show
            ann.loraine Ann Loraine added a comment - [~RobertReid] : MD5 checking shows that the fastq files were not corrupted by the transfer. Nowlan Freese : Suggests comparing the first few lines per file (via "head" function) to see if pairs are present
            Hide
            ann.loraine Ann Loraine added a comment -

            Problem is: The mate pair records in the "1" and "2" fastq files per sample are not in the same order in the two files. According to the above script (validating mate pairs) the records are out of order.

            Show
            ann.loraine Ann Loraine added a comment - Problem is: The mate pair records in the "1" and "2" fastq files per sample are not in the same order in the two files. According to the above script (validating mate pairs) the records are out of order.
            Hide
            robofjoy Robert Reid added a comment -

            MD5 Check:

            All the MD5 check out.

            Details on this are saved to Google Drive in Kelsie's experiment folder.
            https://drive.google.com/drive/folders/1TxUDhJHr9mrXOVysrcceS9YGyTXFHRmo?usp=share_link

            Show
            robofjoy Robert Reid added a comment - MD5 Check: All the MD5 check out. Details on this are saved to Google Drive in Kelsie's experiment folder. https://drive.google.com/drive/folders/1TxUDhJHr9mrXOVysrcceS9YGyTXFHRmo?usp=share_link
            Hide
            ann.loraine Ann Loraine added a comment -

            Please add validateMatePairs.pl to "src" directory in the repository:

            Show
            ann.loraine Ann Loraine added a comment - Please add validateMatePairs.pl to "src" directory in the repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/ .
            Hide
            ann.loraine Ann Loraine added a comment -

            NF suggestion: Run fastqc on output of trim galore to observe new and possibly aberrant size distribution of read sequence

            Show
            ann.loraine Ann Loraine added a comment - NF suggestion: Run fastqc on output of trim galore to observe new and possibly aberrant size distribution of read sequence
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            validateMatePairs.pl code:

            #!/usr/bin/perl
            
            use strict;
            use warnings;
            
            open FH1,$ARGV[0] or die "\n can not open file $ARGV[0]\n";  ## first pair
            open FH2,$ARGV[1] or die "\n can not open file $ARGV[1]\n";  ## second pair
            my($str1,$str2,$tempStr);
            my($n1,$n2);
            $n1 = 0;
            $n2 = 0;
            print " Validating $ARGV[0] and $ARGV[1] \n";
            my(@a1,@a2);
            
            	while($str1 = <FH1>){
                    $tempStr = <FH1>;
                    $tempStr = <FH1>;
                    $tempStr = <FH1>;
                    ++$n1;
            
                    $str2 = <FH2>;
                    $tempStr = <FH2>;
                    $tempStr = <FH2>;
                    $tempStr = <FH2>;
                    ++$n2;
            
                    $str1 =~ s/\n//;
                    $str1 =~ s/\r//;
                    $str2 =~ s/\n//;
                    $str2 =~ s/\r//;
            
            
                    @a1 = split(/\s+/,$str1);
                    @a2 = split(/\s+/,$str2);
            
                    $str1 = $a1[0];
                    $str2 = $a2[0];
            
                    $str1 =~ s/(\/\d)$//;
                    $str2 =~ s/(\/\d)$//;
            
                            if($str1 ne $str2){
                            die "Read pairs not found for $str1 and $str2, mate-pair files are not ordered\n";
                            }
                    }  ## while(<FH1>) ends
            close FH1;
            close FH2;
            
            print "Total validated mates: $n1 and $n2: Read-pairs are properly ordered\n";
            

            Next step: Run the code with unzipped fastq files.

            gzip -d *.gz
            

            Comment: script finished running and output files say Read-pairs are properly ordered. Decompressing the files helped fix the validate script error.

            Show
            Mdavis4290 Molly Davis added a comment - - edited validateMatePairs.pl code: #!/usr/bin/perl use strict; use warnings; open FH1,$ARGV[0] or die "\n can not open file $ARGV[0]\n" ; ## first pair open FH2,$ARGV[1] or die "\n can not open file $ARGV[1]\n" ; ## second pair my($str1,$str2,$tempStr); my($n1,$n2); $n1 = 0; $n2 = 0; print " Validating $ARGV[0] and $ARGV[1] \n" ; my(@a1,@a2); while ($str1 = <FH1>){ $tempStr = <FH1>; $tempStr = <FH1>; $tempStr = <FH1>; ++$n1; $str2 = <FH2>; $tempStr = <FH2>; $tempStr = <FH2>; $tempStr = <FH2>; ++$n2; $str1 =~ s/\n //; $str1 =~ s/\r //; $str2 =~ s/\n //; $str2 =~ s/\r //; @a1 = split(/\s+/,$str1); @a2 = split(/\s+/,$str2); $str1 = $a1[0]; $str2 = $a2[0]; $str1 =~ s/(\/\d)$ //; $str2 =~ s/(\/\d)$ //; if ($str1 ne $str2){ die "Read pairs not found for $str1 and $str2, mate-pair files are not ordered\n" ; } } ## while (<FH1>) ends close FH1; close FH2; print "Total validated mates: $n1 and $n2: Read-pairs are properly ordered\n" ; Next step: Run the code with unzipped fastq files. gzip -d *.gz Comment: script finished running and output files say Read-pairs are properly ordered. Decompressing the files helped fix the validate script error.
            Hide
            robofjoy Robert Reid added a comment -

            I ran trimmomatic on the raw data to see what would happen.

            Resulting files are located here:
            /nobackup/tomato_genome/30-804059537-KP/trimmoTest

            Script to run it is here:
            /projects/tomato_genome/scripts/rob/trimmomatic-temp.slurm

            The script will validate pairs afterwards. (using option -validate pairs)
            It appears that 99.95% of all reads are great.
            About 2-8 reads are bad per pairing.
            Those reads are saved as unpaired.fastq files.

            We could now run nextflow with these read files instead.

            Show
            robofjoy Robert Reid added a comment - I ran trimmomatic on the raw data to see what would happen. Resulting files are located here: /nobackup/tomato_genome/30-804059537-KP/trimmoTest Script to run it is here: /projects/tomato_genome/scripts/rob/trimmomatic-temp.slurm The script will validate pairs afterwards. (using option -validate pairs) It appears that 99.95% of all reads are great. About 2-8 reads are bad per pairing. Those reads are saved as unpaired.fastq files. We could now run nextflow with these read files instead.
            Hide
            ann.loraine Ann Loraine added a comment -

            [~RobertReid] suggests counting number of aligned versus unaligned reads, for each BAM file we have.

            Show
            ann.loraine Ann Loraine added a comment - [~RobertReid] suggests counting number of aligned versus unaligned reads, for each BAM file we have.
            Hide
            robofjoy Robert Reid added a comment -

            To use samtools to view the aligned and unaligned.

            READS MAPPED:
            module load samtools
            samtools view -c -F 4 nagcarlang-sorted.bam

            For Unmapped:
            samtools view -c -f 4 nagcarlang-sorted.bam

            Let's calculate coverage:
            samtools depth nagcarlang-sorted.bam | awk '

            {sum+=$3}

            END

            { print "Average = ",sum/NR}

            '

            Put these lines into a slurm script. Should run very quickly.

            Show
            robofjoy Robert Reid added a comment - To use samtools to view the aligned and unaligned. READS MAPPED: module load samtools samtools view -c -F 4 nagcarlang-sorted.bam For Unmapped: samtools view -c -f 4 nagcarlang-sorted.bam Let's calculate coverage: samtools depth nagcarlang-sorted.bam | awk ' {sum+=$3} END { print "Average = ",sum/NR} ' Put these lines into a slurm script. Should run very quickly.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Created pull request to add src folder and perl validation file to bitbucket:

            https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/1

            [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - Created pull request to add src folder and perl validation file to bitbucket: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/1 [~aloraine]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Created script to use samtools to view the aligned and unaligned:

            #!/bin/bash
            
            
            #SBATCH --job-name=samtools_view        #job name after submission
            #SBATCH -p Orion                        #partition being used
            #SBATCH -N 1                            #number of nodes to use
            #SBATCH --ntasks-per-node=8             #max number of tasks per node
            #SBATCH --mem=900gb                     #memory required per node
            #SBATCH -t 14-00:00                     #time (D-HH:MM)
            #SBATCH -o samtools_view.%j.out         #standard output file
            #SBATCH --mail-type=END,FAIL            #Notifications for job complete/failure
            #SBATCH --mail-user=mdavi258@uncc.edu   #Send to user email
            #SBATCH --array=1-63
            
            
            file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p"  /nobackup/tomato_genome/30-804059537-KP/kp_runlist.txt)
            
            module load samtools
            echo "Mapped:" ${file}
            samtools view -c -F 4 ${file}.bam
            echo  "Unmapped:" ${file}
            samtools view -c -f 4 ${file}.bam
            
            echo "Calculate Coverage" 
            samtools depth ${file}.bam | awk '{sum+=$3} END { print "Average = ",sum/NR}'
            
            echo "done"
            echo "---------------------------------------------------------"
            

            Directory: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon

            Combined output files into one:

            cat *.out > ./mergedsamtoolsOut.txt
            

            Output File:

            mergedsamtoolsOut.txt

            Show
            Mdavis4290 Molly Davis added a comment - - edited Created script to use samtools to view the aligned and unaligned: #!/bin/bash #SBATCH --job-name=samtools_view #job name after submission #SBATCH -p Orion #partition being used #SBATCH -N 1 #number of nodes to use #SBATCH --ntasks-per-node=8 #max number of tasks per node #SBATCH --mem=900gb #memory required per node #SBATCH -t 14-00:00 #time (D-HH:MM) #SBATCH -o samtools_view.%j.out #standard output file #SBATCH --mail-type=END,FAIL #Notifications for job complete/failure #SBATCH --mail-user=mdavi258@uncc.edu #Send to user email #SBATCH --array=1-63 file=$(sed -n -e "${SLURM_ARRAY_TASK_ID}p" /nobackup/tomato_genome/30-804059537-KP/kp_runlist.txt) module load samtools echo "Mapped:" ${file} samtools view -c -F 4 ${file}.bam echo "Unmapped:" ${file} samtools view -c -f 4 ${file}.bam echo "Calculate Coverage" samtools depth ${file}.bam | awk '{sum+=$3} END { print "Average = " ,sum/NR}' echo "done" echo "---------------------------------------------------------" Directory: /nobackup/tomato_genome/30-804059537-KP/results/star_salmon Combined output files into one: cat *.out > ./mergedsamtoolsOut.txt Output File: mergedsamtoolsOut.txt
            Mdavis4290 Molly Davis made changes -
            Attachment mergedsamtoolsOut.txt [ 17707 ]
            Mdavis4290 Molly Davis made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attack to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization
            * Create annots.xml metadata file for visualization

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Link This issue relates to IGBF-3261 [ IGBF-3261 ]
            nfreese Nowlan Freese made changes -
            Sprint Spring 3 2023 Feb 1 [ 163 ] Spring 3 2023 Feb 1, Spring 4 2023 Feb 21 [ 163, 164 ]
            nfreese Nowlan Freese made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Story Points 2 4
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Ann's comments:

            Based on output above:

            • The bam files do not contain any unmapped reads, only mapped reads
            • The samtools "depth" command computes the number of alignments per base pair position - see http://www.htslib.org/doc/samtools-depth.html
            • For transcriptome data, the "depth" command does not make a lot of sense because the depth of read alignments at any given position depends on whether or not that position is inside an exon, and also on the level of expression of that exon
            • I don't know what "NR" means and where this is coming from in the "sum/NR" statement at the end of the script

            Conclusion: This output of this script script does not explain the QC result.

            We do not know why some of the samples did not perform well. Let's proceed with the pipeline and visualize the data in a genome browser as this visualization step may reveal more information about the problematic samples.

            Show
            ann.loraine Ann Loraine added a comment - - edited Ann's comments: Based on output above: The bam files do not contain any unmapped reads, only mapped reads The samtools "depth" command computes the number of alignments per base pair position - see http://www.htslib.org/doc/samtools-depth.html For transcriptome data, the "depth" command does not make a lot of sense because the depth of read alignments at any given position depends on whether or not that position is inside an exon, and also on the level of expression of that exon I don't know what "NR" means and where this is coming from in the "sum/NR" statement at the end of the script Conclusion: This output of this script script does not explain the QC result. We do not know why some of the samples did not perform well. Let's proceed with the pipeline and visualize the data in a genome browser as this visualization step may reveal more information about the problematic samples.
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1, Spring 4 2023 Feb 21 [ 163, 164 ] Spring 3 2023 Feb 1, Spring 5 2023 Mar 6 [ 163, 165 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1, Spring 5 2023 Mar 6 [ 163, 165 ] Spring 3 2023 Feb 1, Spring 6 2023 Mar 20 [ 163, 166 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1, Spring 6 2023 Mar 20 [ 163, 166 ] Spring 3 2023 Feb 1, Spring 7 2023 Apr 10 [ 163, 167 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1, Spring 7 2023 Apr 10 [ 163, 167 ] Spring 3 2023 Feb 1 [ 163 ]
            Mdavis4290 Molly Davis made changes -
            Issue Type Task [ 3 ] Epic [ 10000 ]
            Sprint Spring 3 2023 Feb 1 [ 163 ] Spring 3 2023 Feb 1, Spring 8 2023 Apr 24 [ 163, 168 ]
            Mdavis4290 Molly Davis made changes -
            Epic Link IGBF-2993 [ 21429 ]
            Mdavis4290 Molly Davis made changes -
            Epic Name Process Kelsey's Palanivelu Lab data
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3323 [ 22345 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3324 [ 22346 ]
            Mdavis4290 Molly Davis made changes -
            Epic Color ghx-label-6
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3325 [ 22347 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3326 [ 22348 ]
            Mdavis4290 Molly Davis made changes -
            Comment [ Update:
            * Created sample sheet:
             [^KP_samples.csv]

            * Started Nextflow pipeline. ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3328 [ 22350 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3328 [ IGBF-3328 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3325 [ IGBF-3325 ]
            Mdavis4290 Molly Davis made changes -
            Link This issue relates to IGBF-3326 [ IGBF-3326 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 2023 Feb 1, Spring 8 2023 Apr 24 [ 163, 168 ] Spring 3 2023 Feb 1, Spring 8 2023 Apr 24, Spring 9 2023 May 8 [ 163, 168, 169 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Mdavis4290 Molly Davis made changes -
            Attachment Heinz-Ovary-Rep1-0hr-25C-unpol_R1_001_fastqc.html [ 17689 ]
            Mdavis4290 Molly Davis made changes -
            Attachment Heinz-Ovary-Rep2-0hr-25C-unpol_R1_001_fastqc.html [ 17690 ]
            ann.loraine Ann Loraine made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3338 [ 22360 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3345 [ 22367 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3366 [ 22389 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3367 [ 22390 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3375 [ 22398 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3376 [ 22399 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3377 [ 22400 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3389 [ 22412 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3390 [ 22413 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3391 [ 22414 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3392 [ 22415 ]
            Mdavis4290 Molly Davis made changes -
            Summary Process Kelsey's Palanivelu Lab data Run Nextflow with Kelsey's Palanivelu Lab 2023 data
            ann.loraine Ann Loraine made changes -
            Epic Name Process Kelsey's Palanivelu Lab data Process and deploy Palanivelu Lab data
            ann.loraine Ann Loraine made changes -
            Description For this task, process new data set from the Palanivelu lab. These data are from Kelsey

            RR has downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.
            Please do the data processing in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            To-do:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.

            Attached:
            * Azenta (sequencing provider) data report, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task, process new and old experimental data sets from the Palanivelu lab.

            First process data from 2023. These data were generated by Kelse Pryze, graduate student in RP Lab.

            RR downloaded this 2023 data from Kelsey onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie. Please process this data set in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.



            Repo:
            * Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Comment [ *Next steps:*

            * Generate scaled coverage graphs (Molly)
            * Generate junction files (Molly - ask Ann to explain how to do this)
            * Copy data to IGB Quickload host for visualization in IGB (Ann)
            * Create samples Excel spreadsheet needed to create annots.xml for Quickload site (Molly - first draft & Ann - final draft) ]
            ann.loraine Ann Loraine made changes -
            Assignee Molly Davis [ molly ]
            Mdavis4290 Molly Davis made changes -
            Attachment Screen Shot 2023-02-09 at 9.40.46 AM.png [ 17678 ]
            Mdavis4290 Molly Davis made changes -
            Attachment KP_samples.csv [ 17686 ]
            Mdavis4290 Molly Davis made changes -
            Attachment KP_multiqc_report.html [ 17687 ]
            ann.loraine Ann Loraine made changes -
            Description For this task, process new and old experimental data sets from the Palanivelu lab.

            First process data from 2023. These data were generated by Kelse Pryze, graduate student in RP Lab.

            RR downloaded this 2023 data from Kelsey onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie. Please process this data set in this directory:

            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.



            Repo:
            * Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2022 Kelsie Pryze's unpollinated pistil experiment:

            We first obtained and processed these data in 2023. These data were generated by Kelse Pryze, graduate student in RP Lab. You can identify these samples by looking at their Azenta identifier. Also, in our various pipelines and Jira records, we have been referring to these data by the data we got them: the "KP 2023" dataset.

            These data are from an experiment done by KP in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers of four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the unpollinated pistils because they were all sequened at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to separate these to make it more clear that the sample generation was done separately from the unpollinated pistils.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository.

            RR downloaded this 2023 data from Kelsey onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie. Please process this data set in this directory:

            As of August 2023, we are storing the original data files in a folder named for the Azenta identifier: 30-804059537

            : processed data The data are stored on
            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.



            Repo:
            * Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Summary Run Nextflow with Kelsey's Palanivelu Lab 2023 data Process and deploy Palanivelu Lab data
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2022 Kelsie Pryze's unpollinated pistil experiment:

            We first obtained and processed these data in 2023. These data were generated by Kelse Pryze, graduate student in RP Lab. You can identify these samples by looking at their Azenta identifier. Also, in our various pipelines and Jira records, we have been referring to these data by the data we got them: the "KP 2023" dataset.

            These data are from an experiment done by KP in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers of four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the unpollinated pistils because they were all sequened at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to separate these to make it more clear that the sample generation was done separately from the unpollinated pistils.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository.

            RR downloaded this 2023 data from Kelsey onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie. Please process this data set in this directory:

            As of August 2023, we are storing the original data files in a folder named for the Azenta identifier: 30-804059537

            : processed data The data are stored on
            * /nobackup/tomato_genome/30-804059537-KP (kp for "Kelsey Pryze")

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with SL5/2022 target genome assembly using "reverse" strandedness parameter.
            * Check the multi-qc report (attach to this ticket). Re-run processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters.



            Repo:
            * Bitbucket repo: https://bitbucket.org/hotpollen/pistil-rna-seq

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2022 Kelsie Pryze's unpollinated pistil heat stress experiment:

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded this 2023 data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.

            We then began processing these data in spring of 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-804059537. Also, in our various pipelines and Jira records, we have been referring to these data by the data we started working with them: the "KP 2023" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2022.

            When we deploy these data to the genome browser for visualization, we will probably use a different name that better describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            **More information to be added**

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            **More information to be added**

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyiesusing "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Closed [ 6 ] To-Do [ 10305 ]
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2022 Kelsie Pryze's unpollinated pistil heat stress experiment:

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded this 2023 data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/30-804059537-kelsie.

            We then began processing these data in spring of 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-804059537. Also, in our various pipelines and Jira records, we have been referring to these data by the data we started working with them: the "KP 2023" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2022.

            When we deploy these data to the genome browser for visualization, we will probably use a different name that better describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            **More information to be added**

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            **More information to be added**

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyiesusing "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            **More information to be added**

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            **More information to be added**

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            **More information to be added**

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            **More information to be added**

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated pistil and style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            To-do for each experimental data set:

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas

            The azenta code for this sequencing experiment was: ????

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas

            The azenta code for this sequencing experiment was: ????

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            The azenta code for this sequencing experiment was: 30-681594536

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            Notes:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to Azenta sequencing orders. The RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta, the sequencing company. The company then sent links to the resulting sequence data files to the RP lab, who downloaded them. We also got them and put them on the Charlotte HPC file system. We then proceeded to process them to generate files for visualization in IGB and "counts" files for processing using statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            The azenta code for this sequencing experiment was: 30-681594536

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            *About the data*:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

            For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

            Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            *1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536*

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we present the data in IGB.

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            *About the data*:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

            For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

            Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            *1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536*

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            2) self-pollinated stigma+style heat stress experiment

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we present the data in IGB.

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            *About the data*:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

            For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

            Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            *1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536*

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            *2) self-pollinated stigma+style heat stress experiment, Azenta id 30-804059537*

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            *3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043*

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we present the data in IGB.

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            ann.loraine Ann Loraine made changes -
            Description For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            *About the data*:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

            For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

            Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            *1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536*

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The samples were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the sample generation was done separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            *2) self-pollinated stigma+style heat stress experiment, Azenta id 30-804059537*

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            *3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043*

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we present the data in IGB.

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblyies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset
            * Add the "counts" data files to the repository for statistical analysis

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            For this task,

            * process new and old experimental data sets from the Ravi Palanivelu lab
            * confer with Palanivelu lab personnel to understand and document the samples
            * track code and key data files using this repository: https://bitbucket.org/hotpollen/pistil-rna-seq/src/main/

            *About the data*:

            As of summer 2023, there are now three collections of sequencing data that we got from the Ravi Palanivelu lab. These collections correspond to "batches" of RNA samples that were sent to GeneWiz/Azenta for library synthesis and subsequent sequencing.

            For two of these batches, the RP lab created the biological material for samples, extracted RNA, and then sent the RNA boxes to Azenta (formerly GeneWiz), the sequencing company. For one of these batches, the so-called "library synthesis pilot," the RP lab synthesized the libraries themselves and then sent the libraries to Azenta.

            Once the sequencing data are complete, the company sent links to an ftp site containing the data files to the RP lab, who downloaded them or asked us to download them. We then obtained the sequence data and deployed them to the Charlotte HPC file system for the next steps - data processing, in which we generate files for visualization in IGB and, also, "counts" files for statistical analysis libraries developed for RNA-Seq data.

            The three data collections are:

            *1) 2021 Kelsie Pryze's unpollinated pistil heat stress experiment, azenta id: 30-681594536*

            These data are from an experiment done by Kelse Pryze in which she tested the effects of heat stress on un-pollinated tomato pistils dissected from emasculated flowers from four tomato varieties: Heinz, Malintka, Tamaulipas, and Nagcarlang. All sample types have three replicates per sample type represented in the sequencing data, except for Tamaulipas, which has two. KP provided a detailed description of exactly how the samples were generated. The biological material were created in 2021, in the summer and early spring.

            The data files also included three data files from a different experiment investigating the transcriptome of dissected, unpollinated ovary tissue. We processed these data alongside the fastq files from the un-pollinated pistils because they were all sequenced at the same time, in the same lot of RNAs sent to the sequencing provider. However, for visualization, we will probably want to present them in ways that will make it super clear that the biological material were created separately from the un-pollinated pistils.

            Rob Reid downloaded the data onto the UNCC cluster and saved it here: /projects/tomato_genome/rnaseq/ravi-2022-fullrun/30-681594536.

            We then began processing these data in 2023, using the high performance computing cluster at UNC Charlotte. You can identify these samples on our file system by looking for their Azenta identifier - 30-681594536. Also, in our various pipelines and Jira records, we have been referring to these data by the date we got them from the RP Lab: the "Ravi 2022" dataset.

            KP provided documentation describing these samples. We will place these files in a "Documentation" folder in the git repository. However, as you will see from the documentation in the repository, the samples themselves were generated during the summer and early spring of 2021.

            When we deploy these data to the genome browser for visualization, we will probably use a study name that describes the data and makes it easy for RP Lab personnel and others to recognize them in the browser or other settings.

            *2) self-pollinated stigma+style heat stress experiment, Azenta id 30-804059537*

            We have been referring to this experimental data set as "KP-2023", referring to the experimenter Kelsie Pryze and the date when we obtained the experimental data sequence files.

            The original sequence files are downloaded to this location on the UNC Charlotte cluster computing system: /projects/tomato_genome/rnaseq/30-804059537-kelsie

            This experiment included sample types testing two temperature conditions, three treatment durations, four varieties, and one tissue type. These were:

            * temperature conditions: 37 degrees C (heat stress) and 25 degrees C (control)
            * treatment durations: 0 hours (no heat stress applied), 3 hours, and 8 hours
            * four varieties: Heinz, Malintka, Nagcarlang, Tamaulipas
            * tissue type: dissected stigma and style tissue from self-pollinated flowers

            There were three replicates per sample type. The zero-hour samples however included three 25 degrees C samples and no 37 degrees C samples.

            Number of samples: (2 conditions * 4 varieties * 2 treatment durations * 3 replications) + (1 condition * 4 * 1 treatment duration (0 hours) * 3 replications ) = 60 samples

            *3) self-pollinated and unpollinated Tamaulipas library preparation pilot experiment, Azenta id 30-605730043*

            This experiment performed by Kelsie Pryze involved creating libraries for sequencing using RNAs from pollinated and upollinated samples from Tamaulipas plants.

            The data from this experiment are stored on the UNC Charlotte cluster in: /projects/tomato_genome/rnaseq/ravi-tamaulipas
            Rob downloaded these data from the sequencing provider on or around December 15, 2021. (This is the date that Rob created a Google Doc describing the files available from the sequencing provider's data transfer ftp site.)

            Note: We need to confirm if that the sequences obtained from the unpollinated pistils were from the same experiment as (1) above. If yes, which "replicate" were they? This will influence how we label the data in IGB.

            *To-do for each experimental data set:*

            * Run nf-core/rnaseq pipeline with both the SL5/2022 and SL4/2019 target genome assemblies using "reverse" strandedness parameter.
            * Check the multi-qc report. Re-run the processing as necessary.
            * Rename BAM files to not included "sorted" in the name.
            * Create scaled coverage graphs.
            * Create junction files.
            * Migrate data to an on-line location for IGB visualization.
            * Create annots.xml metadata file with visualization parameters for each dataset; add the data collection to the makeAnnotsXml.py script
            * Add the "counts" data files to the repository for statistical analysis
            * Add documentation for each sequence collection to the git repository
            * Perform data checking to catch any record-keeping errors that may have occurred

            Attached:
            * Azenta (sequencing provider) data report for KP 2023 data, with numbers of sequences produced
            * Quote from Azenta indicating strand-specific RNA-Seq, 2x150 bp paired end sequencing

            Contact:
            * Kelsey Pryze - kelseypryze@email.arizona.edu
            Hide
            robofjoy Robert Reid added a comment -
            Show
            robofjoy Robert Reid added a comment - I am including the Google Links that Kelsie provided here for documentation purposes. Tomato Pistil Tissue Collection Protocol https://docs.google.com/document/d/1g8GJBEzxUC-QjfMXk0Eq5mT8e31bGirjXYM3-Sv4u7Q/edit?usp=sharing Experimental Design for Solavar https://docs.google.com/document/d/1BXVq-0oop3Ch3Qzbr2nkhQczG1cZH2yBqQRMmBXKGyo/edit?usp=sharing Sequenced Samples https://docs.google.com/spreadsheets/d/1WwPzifPzbACmgS3uR_V92cYIGN3qS1yWPGHBu7DY_-I/edit?usp=sharing
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-3420 [ 22443 ]
            Mdavis4290 Molly Davis made changes -
            Epic Child IGBF-3434 [ 22536 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-3466 [ 22627 ]
            ann.loraine Ann Loraine made changes -
            Epic Child IGBF-3471 [ 22632 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: