Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3346

Prepping data and entries for SRA for Muday time course

    Details

    • Type: Task
    • Status: To-Do (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      2
    • Sprint:
      Summer 2 2023 May 29, Summer 3 2023 June 12, Summer 6 2023 July 24, Summer 7 2023 Aug 7

      Description

      Starting the data prepping and table prepping for the Muday time course data.

      EXTRA step will be needed due to the labelling errors!!

      A script to rename the files will be needed and this script will need review to ensure we get that step right!

      Otherwise, the SRA submission should be straightforward, similar to IGBF-3255.

        Attachments

          Issue Links

            Activity

            Hide
            robofjoy Robert Reid added a comment -

            And the adventure begins:

            With another biosample sheet for submission.

            Working copy can be found here:
            https://docs.google.com/spreadsheets/d/1F39JAFfvyct4hdpfV2narT4BH6T13mBgJPvfl5BkbzM/edit?usp=sharing

            Still in progress atm.

            Show
            robofjoy Robert Reid added a comment - And the adventure begins: With another biosample sheet for submission. Working copy can be found here: https://docs.google.com/spreadsheets/d/1F39JAFfvyct4hdpfV2narT4BH6T13mBgJPvfl5BkbzM/edit?usp=sharing Still in progress atm.
            Hide
            robofjoy Robert Reid added a comment -

            The original untouched data from Azenta is located here:

            /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/00_fastq

            I plant to make a working copy of this folder for the purposes of renaming some files.

            COPY TIME:
            /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences$ rsync -aP /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/* ./

            CHECK MD5 SUMS to ensure copy was correct is next step.

            Show
            robofjoy Robert Reid added a comment - The original untouched data from Azenta is located here: /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/00_fastq I plant to make a working copy of this folder for the purposes of renaming some files. COPY TIME: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences$ rsync -aP /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/* ./ CHECK MD5 SUMS to ensure copy was correct is next step.
            Hide
            robofjoy Robert Reid added a comment -

            COPY ERROR.

            1 file didn't make it.
            So copied that file again.

            Rerunning the MD5 checksums again:
            WHERE: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/00_fastq

            `while read file; do echo $file; md5sum $

            {file} >> md5-version2.txt; done < allfiles.txt`
            `diff md5.orig md5-version2.txt`

            It is now the same. Copy is Successful.



            PLaying in SaNDbox
            To test the renaming scheme to reflect the errors made by Azenta, I am making a sandbox area. Rather than full sequences, I head each sequence file to grab the first 20 lines in the file and store that in the sandbox.

            The sandbox is here:
            /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone

            To grab the 20 lines ands make new files in the above location:
            `while read file; do echo $file; zcat ../00_fastq/${file}

            .gz |head -n 20 > $

            {file}; gzip ${file}

            ; done < files.txt`

            (we are unzipping via zcat, grabbing the 20 lines and gzipping again). Success.

            Check that each file is 20 lines long.
            `for file in *gz; do zcat $file | wc -l; done`

            Toy data ready to be played with!

            Show
            robofjoy Robert Reid added a comment - COPY ERROR. 1 file didn't make it. So copied that file again. Rerunning the MD5 checksums again: WHERE: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/00_fastq `while read file; do echo $file; md5sum $ {file} >> md5-version2.txt; done < allfiles.txt` `diff md5.orig md5-version2.txt` It is now the same. Copy is Successful. PLaying in SaNDbox To test the renaming scheme to reflect the errors made by Azenta, I am making a sandbox area. Rather than full sequences, I head each sequence file to grab the first 20 lines in the file and store that in the sandbox. The sandbox is here: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone To grab the 20 lines ands make new files in the above location: `while read file; do echo $file; zcat ../00_fastq/${file} .gz |head -n 20 > $ {file}; gzip ${file} ; done < files.txt` (we are unzipping via zcat, grabbing the 20 lines and gzipping again). Success. Check that each file is 20 lines long. `for file in *gz; do zcat $file | wc -l; done` Toy data ready to be played with!
            Hide
            robofjoy Robert Reid added a comment -

            Which files need renaming??

            We know samples were mislabeled at Azenta. For a reminder see ticket below:
            More details about that are in this ticket: https://jira.bioviz.org/browse/IGBF-3290
            From there:

            A.34.15.8 is actually F.34.15.8 and vice versa
            A.28.30.8 is actually V.28.30.8 and vice versa
            A.34.45.8 is actually F.34.45.8 and vice versa
            A.28.75.8 is actually V.28.75.8 and vice versa

            The Excel sheet that conveys the changes is this one: https://jira.bioviz.org/secure/attachment/17863/Muday%20lab%20RNA%20samples%20for%20sample%20name%20conversion.xls
            (or see ticket IGBF-3290 above)

            So we need to swap using a temp file!

            And at the same time let's rename the files too to the 4 code format.
            Example:

            A.15.28.9 is replicate 9, genotype ARE, 15 minutes time point, and 28 degrees C temperature.

            So 9-VF36-75-min-28C_R1_001.fastq.gz would become V.75.28.9_R1.fastq.gz . (NEED TO MAINTAIN THE R1 and R2 pairs!!)

            "F.28.15.7" "F.34.15.7" "F.28.30.7" "F.34.30.7"

              1. [6] "F.28.45.7" "F.34.45.7" "F.28.75.7" "F.34.75.7" "V.28.15.7"
              2. [11] "V.34.15.7" "V.28.30.7" "V.34.30.7" "V.28.45.7" "V.34.45.7"
              3. [16] "V.28.75.7" "V.34.75.7" "A.28.15.7" "A.34.15.7" "A.28.30.7"
              4. [21] "A.34.30.7" "A.28.45.7" "A.34.45.7" "A.28.75.7" "A.34.75.7"
              5. [26] "F.28.15.8" "F.34.15.8" "F.28.30.8" "F.34.30.8" "F.28.45.8"
              6. [31] "F.34.45.8" "F.28.75.8" "F.34.75.8" "V.28.15.8" "V.34.15.8"
              7. [36] "V.28.30.8" "V.34.30.8" "V.28.45.8" "V.34.45.8" "V.28.75.8"
              8. [41] "V.34.75.8" "A.28.15.8" "A.34.15.8" "A.28.30.8" "A.34.30.8"
              9. [46] "A.28.45.8" "A.34.45.8" "A.28.75.8" "A.34.75.8" "F.28.15.9"
              10. [51] "F.34.15.9" "F.28.30.9" "F.34.30.9" "F.28.45.9" "F.34.45.9"
              11. [56] "F.28.75.9" "F.34.75.9" "V.28.15.9" "V.34.15.9" "V.28.30.9"
              12. [61] "V.34.30.9" "V.28.45.9" "V.34.45.9" "V.28.75.9" "V.34.75.9"
              13. [66] "A.28.15.9" "A.34.15.9" "A.28.30.9" "A.34.30.9" "A.28.45.9"
              14. [71] "A.34.45.9" "A.28.75.9" "A.34.75.9"
            Show
            robofjoy Robert Reid added a comment - Which files need renaming?? We know samples were mislabeled at Azenta. For a reminder see ticket below: More details about that are in this ticket: https://jira.bioviz.org/browse/IGBF-3290 From there: A.34.15.8 is actually F.34.15.8 and vice versa A.28.30.8 is actually V.28.30.8 and vice versa A.34.45.8 is actually F.34.45.8 and vice versa A.28.75.8 is actually V.28.75.8 and vice versa The Excel sheet that conveys the changes is this one: https://jira.bioviz.org/secure/attachment/17863/Muday%20lab%20RNA%20samples%20for%20sample%20name%20conversion.xls (or see ticket IGBF-3290 above) So we need to swap using a temp file! And at the same time let's rename the files too to the 4 code format. Example: A.15.28.9 is replicate 9, genotype ARE, 15 minutes time point, and 28 degrees C temperature. So 9-VF36-75-min-28C_R1_001.fastq.gz would become V.75.28.9_R1.fastq.gz . (NEED TO MAINTAIN THE R1 and R2 pairs!!) "F.28.15.7" "F.34.15.7" "F.28.30.7" "F.34.30.7" [6] "F.28.45.7" "F.34.45.7" "F.28.75.7" "F.34.75.7" "V.28.15.7" [11] "V.34.15.7" "V.28.30.7" "V.34.30.7" "V.28.45.7" "V.34.45.7" [16] "V.28.75.7" "V.34.75.7" "A.28.15.7" "A.34.15.7" "A.28.30.7" [21] "A.34.30.7" "A.28.45.7" "A.34.45.7" "A.28.75.7" "A.34.75.7" [26] "F.28.15.8" "F.34.15.8" "F.28.30.8" "F.34.30.8" "F.28.45.8" [31] "F.34.45.8" "F.28.75.8" "F.34.75.8" "V.28.15.8" "V.34.15.8" [36] "V.28.30.8" "V.34.30.8" "V.28.45.8" "V.34.45.8" "V.28.75.8" [41] "V.34.75.8" "A.28.15.8" "A.34.15.8" "A.28.30.8" "A.34.30.8" [46] "A.28.45.8" "A.34.45.8" "A.28.75.8" "A.34.75.8" "F.28.15.9" [51] "F.34.15.9" "F.28.30.9" "F.34.30.9" "F.28.45.9" "F.34.45.9" [56] "F.28.75.9" "F.34.75.9" "V.28.15.9" "V.34.15.9" "V.28.30.9" [61] "V.34.30.9" "V.28.45.9" "V.34.45.9" "V.28.75.9" "V.34.75.9" [66] "A.28.15.9" "A.34.15.9" "A.28.30.9" "A.34.30.9" "A.28.45.9" [71] "A.34.45.9" "A.28.75.9" "A.34.75.9"
            Hide
            ann.loraine Ann Loraine added a comment -

            Just a quick comment in case this is useful for [~RobertReid]: Muday lab sent us a spreadsheet that documents which samples need to be renamed. There were a lot more than we initially thought.

            The spreadsheet is in this folder on Bitbucket: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/

            Show
            ann.loraine Ann Loraine added a comment - Just a quick comment in case this is useful for [~RobertReid] : Muday lab sent us a spreadsheet that documents which samples need to be renamed. There were a lot more than we initially thought. The spreadsheet is in this folder on Bitbucket: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/
            Hide
            robofjoy Robert Reid added a comment -

            Ha, yes! I found the sheet and was surprised that 2/3 of the samples were wrong!

            I assumed that the red cells were wrong and did a MATCH check to confirm.

            Show
            robofjoy Robert Reid added a comment - Ha, yes! I found the sheet and was surprised that 2/3 of the samples were wrong! I assumed that the red cells were wrong and did a MATCH check to confirm.
            Hide
            robofjoy Robert Reid added a comment -

            A useful table generated by Ann during the re-analysis.

            https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/results/sample_renaming_summary.txt

            Is this:

            original new changed
            F.28.15.7 F.28.15.7 FALSE
            F.34.15.7 F.34.15.7 FALSE
            F.28.30.7 F.28.30.7 FALSE
            F.34.30.7 F.34.30.7 FALSE
            F.28.45.7 F.28.45.7 FALSE
            F.34.45.7 F.34.45.7 FALSE
            F.28.75.7 F.28.75.7 FALSE
            F.34.75.7 F.34.75.7 FALSE
            V.28.15.7 V.28.15.7 FALSE
            V.34.15.7 V.34.15.7 FALSE
            V.28.30.7 V.28.30.7 FALSE
            V.34.30.7 V.34.30.7 FALSE
            V.28.45.7 V.28.45.7 FALSE
            V.34.45.7 V.34.45.7 FALSE
            V.28.75.7 V.28.75.7 FALSE
            V.34.75.7 V.34.75.7 FALSE
            A.28.15.7 A.28.15.7 FALSE
            A.34.15.7 A.34.15.7 FALSE
            A.28.30.7 A.28.30.7 FALSE
            A.34.30.7 A.34.30.7 FALSE
            A.28.45.7 A.28.45.7 FALSE
            A.34.45.7 A.34.45.7 FALSE
            A.28.75.7 A.28.75.7 FALSE
            A.34.75.7 A.34.75.7 FALSE
            F.28.15.8 V.34.15.8 TRUE
            F.34.15.8 A.34.30.8 TRUE
            F.28.30.8 F.34.30.8 TRUE
            F.34.30.8 V.34.30.8 TRUE
            F.28.45.8 F.28.75.8 TRUE
            F.34.45.8 A.28.45.8 TRUE
            F.28.75.8 V.28.45.8 TRUE
            F.34.75.8 F.28.45.8 TRUE
            V.28.15.8 F.28.15.8 TRUE
            V.34.15.8 V.28.15.8 TRUE
            V.28.30.8 A.28.15.8 TRUE
            V.34.30.8 F.28.30.8 TRUE
            V.28.45.8 V.34.75.8 TRUE
            V.34.45.8 F.34.75.8 TRUE
            V.28.75.8 A.34.75.8 TRUE
            V.34.75.8 V.34.45.8 TRUE
            A.28.15.8 A.28.30.8 TRUE
            A.34.15.8 F.34.15.8 TRUE
            A.28.30.8 V.28.30.8 TRUE
            A.34.30.8 A.34.15.8 TRUE
            A.28.45.8 A.34.45.8 TRUE
            A.34.45.8 V.28.75.8 TRUE
            A.28.75.8 F.34.45.8 TRUE
            A.34.75.8 A.28.75.8 TRUE
            F.28.15.9 V.34.30.9 TRUE
            F.34.15.9 V.28.30.9 TRUE
            F.28.30.9 V.34.15.9 TRUE
            F.34.30.9 V.28.15.9 TRUE
            F.28.45.9 V.34.75.9 TRUE
            F.34.45.9 V.28.75.9 TRUE
            F.28.75.9 V.34.45.9 TRUE
            F.34.75.9 V.28.45.9 TRUE
            V.28.15.9 F.34.30.9 TRUE
            V.34.15.9 F.28.30.9 TRUE
            V.28.30.9 F.34.15.9 TRUE
            V.34.30.9 F.28.15.9 TRUE
            V.28.45.9 F.34.75.9 TRUE
            V.34.45.9 F.28.75.9 TRUE
            V.28.75.9 F.34.45.9 TRUE
            V.34.75.9 F.28.45.9 TRUE
            A.28.15.9 A.34.30.9 TRUE
            A.34.15.9 A.28.30.9 TRUE
            A.28.30.9 A.34.15.9 TRUE
            A.34.30.9 A.28.15.9 TRUE
            A.28.45.9 A.34.75.9 TRUE
            A.34.45.9 A.28.75.9 TRUE
            A.28.75.9 A.34.45.9 TRUE
            A.34.75.9 A.28.45.9 TRUE

            Show
            robofjoy Robert Reid added a comment - A useful table generated by Ann during the re-analysis. https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/results/sample_renaming_summary.txt Is this: original new changed F.28.15.7 F.28.15.7 FALSE F.34.15.7 F.34.15.7 FALSE F.28.30.7 F.28.30.7 FALSE F.34.30.7 F.34.30.7 FALSE F.28.45.7 F.28.45.7 FALSE F.34.45.7 F.34.45.7 FALSE F.28.75.7 F.28.75.7 FALSE F.34.75.7 F.34.75.7 FALSE V.28.15.7 V.28.15.7 FALSE V.34.15.7 V.34.15.7 FALSE V.28.30.7 V.28.30.7 FALSE V.34.30.7 V.34.30.7 FALSE V.28.45.7 V.28.45.7 FALSE V.34.45.7 V.34.45.7 FALSE V.28.75.7 V.28.75.7 FALSE V.34.75.7 V.34.75.7 FALSE A.28.15.7 A.28.15.7 FALSE A.34.15.7 A.34.15.7 FALSE A.28.30.7 A.28.30.7 FALSE A.34.30.7 A.34.30.7 FALSE A.28.45.7 A.28.45.7 FALSE A.34.45.7 A.34.45.7 FALSE A.28.75.7 A.28.75.7 FALSE A.34.75.7 A.34.75.7 FALSE F.28.15.8 V.34.15.8 TRUE F.34.15.8 A.34.30.8 TRUE F.28.30.8 F.34.30.8 TRUE F.34.30.8 V.34.30.8 TRUE F.28.45.8 F.28.75.8 TRUE F.34.45.8 A.28.45.8 TRUE F.28.75.8 V.28.45.8 TRUE F.34.75.8 F.28.45.8 TRUE V.28.15.8 F.28.15.8 TRUE V.34.15.8 V.28.15.8 TRUE V.28.30.8 A.28.15.8 TRUE V.34.30.8 F.28.30.8 TRUE V.28.45.8 V.34.75.8 TRUE V.34.45.8 F.34.75.8 TRUE V.28.75.8 A.34.75.8 TRUE V.34.75.8 V.34.45.8 TRUE A.28.15.8 A.28.30.8 TRUE A.34.15.8 F.34.15.8 TRUE A.28.30.8 V.28.30.8 TRUE A.34.30.8 A.34.15.8 TRUE A.28.45.8 A.34.45.8 TRUE A.34.45.8 V.28.75.8 TRUE A.28.75.8 F.34.45.8 TRUE A.34.75.8 A.28.75.8 TRUE F.28.15.9 V.34.30.9 TRUE F.34.15.9 V.28.30.9 TRUE F.28.30.9 V.34.15.9 TRUE F.34.30.9 V.28.15.9 TRUE F.28.45.9 V.34.75.9 TRUE F.34.45.9 V.28.75.9 TRUE F.28.75.9 V.34.45.9 TRUE F.34.75.9 V.28.45.9 TRUE V.28.15.9 F.34.30.9 TRUE V.34.15.9 F.28.30.9 TRUE V.28.30.9 F.34.15.9 TRUE V.34.30.9 F.28.15.9 TRUE V.28.45.9 F.34.75.9 TRUE V.34.45.9 F.28.75.9 TRUE V.28.75.9 F.34.45.9 TRUE V.34.75.9 F.28.45.9 TRUE A.28.15.9 A.34.30.9 TRUE A.34.15.9 A.28.30.9 TRUE A.28.30.9 A.34.15.9 TRUE A.34.30.9 A.28.15.9 TRUE A.28.45.9 A.34.75.9 TRUE A.34.45.9 A.28.75.9 TRUE A.28.75.9 A.34.45.9 TRUE A.34.75.9 A.28.45.9 TRUE
            Hide
            robofjoy Robert Reid added a comment - - edited

            A simple copy script exists!

            1. It converts each sequence file into the coded single letter format.
            2. It renames the files to match the Excel table above.

            Testing area is on HPC here:

            /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone

            About to need a review and test by another.........

            Show
            robofjoy Robert Reid added a comment - - edited A simple copy script exists! It converts each sequence file into the coded single letter format. It renames the files to match the Excel table above. Testing area is on HPC here: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone About to need a review and test by another.........
            Hide
            robofjoy Robert Reid added a comment -

            To be reviewed:

            • User needs to log into the HPC cluster.
            • Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone
            • Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/)
            • Run the following command
            • bash renameANDrelabel.bash
            • Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)
            Show
            robofjoy Robert Reid added a comment - To be reviewed: User needs to log into the HPC cluster. Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/ ) Run the following command bash renameANDrelabel.bash Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)
            Hide
            robofjoy Robert Reid added a comment -

            NCBI Submission ID: SUB13519532

            Show
            robofjoy Robert Reid added a comment - NCBI Submission ID: SUB13519532
            Hide
            robofjoy Robert Reid added a comment -

            2 more tables to review as well !!

            This is the Biosample and the SRA table.

            They are located in the Google drive at this location:

            https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link

            Like previous tables, need to check everything aligns and is correct.

            We can't truly confirm the SRA table until the renaming step is reviewed and signed off on.

            Show
            robofjoy Robert Reid added a comment - 2 more tables to review as well !! This is the Biosample and the SRA table. They are located in the Google drive at this location: https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link Like previous tables, need to check everything aligns and is correct. We can't truly confirm the SRA table until the renaming step is reviewed and signed off on.
            Hide
            robofjoy Robert Reid added a comment -

            Need a design Description for SRA sheet.
            The Azenta summary is this:

            The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence
            data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed
            using Illumina's bcl2fastq 2.17 software. One mismatch was allowed for index sequence
            identification.

            Show
            robofjoy Robert Reid added a comment - Need a design Description for SRA sheet. The Azenta summary is this: The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software. One mismatch was allowed for index sequence identification.
            Hide
            nfreese Nowlan Freese added a comment -

            Reviewing:

            /testingRenameZone

            • I checked line by line the script renameANDrelabel.bash compared to the Muday-lab-RNA-samples-for-sample-name-conversion.xlsx file. Everything matched correctly. Files that were to be renamed matched the corrected name, and the conversion to the 4 code format was correct.
            • I was not able to run the script as I did not have permission, but I don't know that it matters since I have reviewed the script itself.

            SRA_metadata_Muday144.xlsx

            • I compared the sample_name, library_ID, and title and they all matched.
            • I compared the filename to the sample_name. The filenames appear to be using the mislabeled names, but with the updated 4 code format. I'm not sure what the expectation here is, but these may need to be double-checked.
            • There is only a single design_description that is split across multiple lines (I assume this is a placeholder).
              • Oligod(T) -> Oligo d(T)

            _SRABiosampleForm-muday144.xlsx

            • Compared sample_name, sample_title, cultivar, temp, treatment, description, and Replicate Code. Everything appears to match.
            Show
            nfreese Nowlan Freese added a comment - Reviewing: /testingRenameZone I checked line by line the script renameANDrelabel.bash compared to the Muday-lab-RNA-samples-for-sample-name-conversion.xlsx file. Everything matched correctly. Files that were to be renamed matched the corrected name, and the conversion to the 4 code format was correct. I was not able to run the script as I did not have permission, but I don't know that it matters since I have reviewed the script itself. SRA_metadata_Muday144.xlsx I compared the sample_name, library_ID, and title and they all matched. I compared the filename to the sample_name. The filenames appear to be using the mislabeled names, but with the updated 4 code format. I'm not sure what the expectation here is, but these may need to be double-checked. There is only a single design_description that is split across multiple lines (I assume this is a placeholder). Oligod(T) -> Oligo d(T) _SRABiosampleForm-muday144.xlsx Compared sample_name, sample_title, cultivar, temp, treatment, description, and Replicate Code. Everything appears to match.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Nowlan Freese: suggests Molly also look at it and try to run the script in the "sandbox" space with smaller versions of the files.

            Show
            ann.loraine Ann Loraine added a comment - - edited Nowlan Freese : suggests Molly also look at it and try to run the script in the "sandbox" space with smaller versions of the files.
            Hide
            robofjoy Robert Reid added a comment -

            The Design Description was messed up as Nowlan pointed out. Due to return carriages in the text, creating havoc when pasting into an Excel sheet.

            Reworked the description to be just one line below. This has now been updated on the SRA excel sheet and pasted in properly.

            design description:

            The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software.

            Show
            robofjoy Robert Reid added a comment - The Design Description was messed up as Nowlan pointed out. Due to return carriages in the text, creating havoc when pasting into an Excel sheet. Reworked the description to be just one line below. This has now been updated on the SRA excel sheet and pasted in properly. design description: The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software.
            Hide
            robofjoy Robert Reid added a comment -

            Corrected the description to no longer refer to Oligods !!!! ( Is now Oligo D(T) )

            I do like the term Oligod though, sounds like a deity we can pray to get to get sequencing projects to run smoothly.

            Show
            robofjoy Robert Reid added a comment - Corrected the description to no longer refer to Oligods !!!! ( Is now Oligo D(T) ) I do like the term Oligod though, sounds like a deity we can pray to get to get sequencing projects to run smoothly.
            Hide
            robofjoy Robert Reid added a comment -

            Round 2 of review: This time Molly.

            2 parts to this review:

            #1
            To be reviewed:

            User needs to log into the HPC cluster.
            Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone
            Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/)
            Run the following command (just check the script for logic)
            bash renameANDrelabel.bash
            Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)

            #2 Checking the Biosample and SRA sheets:

            This is the Biosample and the SRA table.

            They are located in the Google drive at this location:

            https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link

            Like previous tables, need to check everything aligns and is correct.

            Show
            robofjoy Robert Reid added a comment - Round 2 of review: This time Molly. 2 parts to this review: #1 To be reviewed: User needs to log into the HPC cluster. Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/ ) Run the following command (just check the script for logic) bash renameANDrelabel.bash Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l) #2 Checking the Biosample and SRA sheets: This is the Biosample and the SRA table. They are located in the Google drive at this location: https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link Like previous tables, need to check everything aligns and is correct.
            Hide
            robofjoy Robert Reid added a comment -

            I need to walk through this all again. As a final check.
            And then go through the NCBI submission portal and ensure all of those pieces are intact and correct.

            The FTP site data, I need to look and see that it is ready to go as well.

            But once that is checked, I think we are ready to submit.

            Show
            robofjoy Robert Reid added a comment - I need to walk through this all again. As a final check. And then go through the NCBI submission portal and ensure all of those pieces are intact and correct. The FTP site data, I need to look and see that it is ready to go as well. But once that is checked, I think we are ready to submit.
            Hide
            ann.loraine Ann Loraine added a comment -

            Nowlan Freese: says check file name concordance

            Show
            ann.loraine Ann Loraine added a comment - Nowlan Freese : says check file name concordance
            Hide
            robofjoy Robert Reid added a comment -

            Sorting folly on my part!

            I have corrected the file names so that they are now in line with the ID and names in the SRA tables.
            Great catch Nowlan!

            When I listed and copied the file name, they sorted by date generated (ls -lrt) and not listed by name, which alters the list order.

            The files all line up correctly now in the table.
            It would be good for Nowlan to double check this however!!
            I will bump to review and assign.

            File to review is this one:
            https://docs.google.com/spreadsheets/d/1n4nsE4E8lykivizPtQyf17XJnL7FRELR/edit?usp=sharing&ouid=100714234126361751017&rtpof=true&sd=true

            Also, FTP environment for NCBI has ben updated in June 2023.
            Need to upload Muday Data again. Process for that has begun. 72 files corresponding to google sheet above.
            R

            Show
            robofjoy Robert Reid added a comment - Sorting folly on my part! I have corrected the file names so that they are now in line with the ID and names in the SRA tables. Great catch Nowlan! When I listed and copied the file name, they sorted by date generated (ls -lrt) and not listed by name, which alters the list order. The files all line up correctly now in the table. It would be good for Nowlan to double check this however!! I will bump to review and assign. File to review is this one: https://docs.google.com/spreadsheets/d/1n4nsE4E8lykivizPtQyf17XJnL7FRELR/edit?usp=sharing&ouid=100714234126361751017&rtpof=true&sd=true Also, FTP environment for NCBI has ben updated in June 2023. Need to upload Muday Data again. Process for that has begun. 72 files corresponding to google sheet above. R
            Hide
            robofjoy Robert Reid added a comment - - edited

            This is an FTp note for Rob:

            /uploads/rreid2_uncc.edu_eIUyy48y/muday144

            It's the location where this data is going short term at NCBI.

            72 pairs of files. 144 total sequence files.

            142 successfully transferred. 2 did not, server disconnection. Resending the final 2.

            Show
            robofjoy Robert Reid added a comment - - edited This is an FTp note for Rob: /uploads/rreid2_uncc.edu_eIUyy48y/muday144 It's the location where this data is going short term at NCBI. 72 pairs of files. 144 total sequence files. 142 successfully transferred. 2 did not, server disconnection. Resending the final 2.
            Hide
            nfreese Nowlan Freese added a comment -

            Testing SRA metadata:

            [~RobertReid] - I think the lines 50-57 and 58-65 for the file names are mismatched/swapped.

            Show
            nfreese Nowlan Freese added a comment - Testing SRA metadata: [~RobertReid] - I think the lines 50-57 and 58-65 for the file names are mismatched/swapped.
            Hide
            robofjoy Robert Reid added a comment -

            Corrected these file names in the table.

            Ready for another Nowlan inspection.

            Show
            robofjoy Robert Reid added a comment - Corrected these file names in the table. Ready for another Nowlan inspection.
            Hide
            nfreese Nowlan Freese added a comment -

            SRA metadata table looks good, I couldn't find any issues.

            Show
            nfreese Nowlan Freese added a comment - SRA metadata table looks good, I couldn't find any issues.
            Hide
            robofjoy Robert Reid added a comment -

            Dear Robert Reid,

            Your submission SUB13519532 has failed with the following error:

            1. Similar projects already exist: PRJNA980666

            That is a new complaint!!!!
            I think I will try and change the title and description to highlight that these are varieties related to flavanoid production.
            I change title and the desc.
            After that, I will need to reach out and chat with the SRA people.

            Show
            robofjoy Robert Reid added a comment - Dear Robert Reid, Your submission SUB13519532 has failed with the following error: 1. Similar projects already exist: PRJNA980666 That is a new complaint!!!! I think I will try and change the title and description to highlight that these are varieties related to flavanoid production. I change title and the desc. After that, I will need to reach out and chat with the SRA people.
            Hide
            robofjoy Robert Reid added a comment -

            This produces an out right rejection, with no ability to edit.

            So I will reach out to the SRA people and provide some changes that hopefully makes them happy.
            Or I will have them add this SRA under the BioProject of Mark's recent submission. Maybe that will appease them!

            Show
            robofjoy Robert Reid added a comment - This produces an out right rejection, with no ability to edit. So I will reach out to the SRA people and provide some changes that hopefully makes them happy. Or I will have them add this SRA under the BioProject of Mark's recent submission. Maybe that will appease them!
            Hide
            robofjoy Robert Reid added a comment -

            I have reached out the the SRA withthe following email:

            Hi I made this submission and it has been rejected due to the Bioproject being too similar to
            PRJNA980666.

            Is it possible to add this to the previous BioProject? The data came from 2 different labs (Brown University versus Wake Forest) and are completely different tomato varieties.
            But they are all the same species and are a similar time course experiment.
            And they are all the same tissue, pollen tube.

            Rob Reid

            Show
            robofjoy Robert Reid added a comment - I have reached out the the SRA withthe following email: Hi I made this submission and it has been rejected due to the Bioproject being too similar to PRJNA980666. Is it possible to add this to the previous BioProject? The data came from 2 different labs (Brown University versus Wake Forest) and are completely different tomato varieties. But they are all the same species and are a similar time course experiment. And they are all the same tissue, pollen tube. Rob Reid
            Hide
            robofjoy Robert Reid added a comment -

            The SRA have responded favorably!!

            SRA submission SUB13519532 is now re-processing with BioProject PRJNA980666.
            Best,
            Rick Lapoint
            SRA Curator

            Show
            robofjoy Robert Reid added a comment - The SRA have responded favorably!! SRA submission SUB13519532 is now re-processing with BioProject PRJNA980666. Best, Rick Lapoint SRA Curator
            Hide
            robofjoy Robert Reid added a comment -

            SUCCESS !!

            Dear Robert Reid,

            This is an automatic acknowledgment that your recent submission to the SRA database has been successfully processed and will be released on the date specified.

            Please reference PRJNA980666 in your publication. This BioProject accession number is provided instead of SRP and should be used in your publication as it will allow better searching in Entrez.

            Accession to cite for these SRA data: PRJNA980666
            Temporary Submission ID: SUB13519532
            Release date: 2023-08-02

            Your SRA records will be accessible with the following link after the indicated release date:
            https://www.ncbi.nlm.nih.gov/sra/PRJNA980666

            Send questions and update requests to sra@ncbi.nlm.nih.gov; include the citation accession PRJNA980666 in any correspondence.

            Regards,

            NCBI SRA Submissions Staff
            Bethesda, Maryland USA

            Show
            robofjoy Robert Reid added a comment - SUCCESS !! Dear Robert Reid, This is an automatic acknowledgment that your recent submission to the SRA database has been successfully processed and will be released on the date specified. Please reference PRJNA980666 in your publication. This BioProject accession number is provided instead of SRP and should be used in your publication as it will allow better searching in Entrez. Accession to cite for these SRA data: PRJNA980666 Temporary Submission ID: SUB13519532 Release date: 2023-08-02 Your SRA records will be accessible with the following link after the indicated release date: https://www.ncbi.nlm.nih.gov/sra/PRJNA980666 Send questions and update requests to sra@ncbi.nlm.nih.gov; include the citation accession PRJNA980666 in any correspondence. Regards, NCBI SRA Submissions Staff Bethesda, Maryland USA
            Hide
            robofjoy Robert Reid added a comment -

            We need to add all of the Accession IDs before we close this out!

            Stay tuned.

            Show
            robofjoy Robert Reid added a comment - We need to add all of the Accession IDs before we close this out! Stay tuned.
            Hide
            robofjoy Robert Reid added a comment - - edited

            SRR Accesions:
            SRR25478240
            SRR25478241
            SRR25478242
            SRR25478243
            SRR25478244
            SRR25478245
            SRR25478246
            SRR25478247
            SRR25478248
            SRR25478249
            SRR25478250
            SRR25478251
            SRR25478252
            SRR25478253
            SRR25478254
            SRR25478255
            SRR25478256
            SRR25478257
            SRR25478258
            SRR25478260
            SRR25478261
            SRR25478262
            SRR25478263
            SRR25478264
            SRR25478265
            SRR25478266
            SRR25478267
            SRR25478268
            SRR25478269
            SRR25478270
            SRR25478272
            SRR25478273
            SRR25478275
            SRR25478276
            SRR25478277
            SRR25478278
            SRR25478279
            SRR25478280
            SRR25478281
            SRR25478282
            SRR25478283
            SRR25478284
            SRR25478285
            SRR25478286
            SRR25478287
            SRR25478289
            SRR25478290
            SRR25478291
            SRR25478292
            SRR25478293
            SRR25478294
            SRR25478295
            SRR25478296
            SRR25478297
            SRR25478298
            SRR25478299
            SRR25478300
            SRR25478301
            SRR25478302
            SRR25478304
            SRR25478305
            SRR25478306
            SRR25478307
            SRR25478308
            SRR25478309
            SRR25478310
            SRR25478311
            SRR25478259
            SRR25478271
            SRR25478274
            SRR25478303
            SRR25478288

            Show
            robofjoy Robert Reid added a comment - - edited SRR Accesions: SRR25478240 SRR25478241 SRR25478242 SRR25478243 SRR25478244 SRR25478245 SRR25478246 SRR25478247 SRR25478248 SRR25478249 SRR25478250 SRR25478251 SRR25478252 SRR25478253 SRR25478254 SRR25478255 SRR25478256 SRR25478257 SRR25478258 SRR25478260 SRR25478261 SRR25478262 SRR25478263 SRR25478264 SRR25478265 SRR25478266 SRR25478267 SRR25478268 SRR25478269 SRR25478270 SRR25478272 SRR25478273 SRR25478275 SRR25478276 SRR25478277 SRR25478278 SRR25478279 SRR25478280 SRR25478281 SRR25478282 SRR25478283 SRR25478284 SRR25478285 SRR25478286 SRR25478287 SRR25478289 SRR25478290 SRR25478291 SRR25478292 SRR25478293 SRR25478294 SRR25478295 SRR25478296 SRR25478297 SRR25478298 SRR25478299 SRR25478300 SRR25478301 SRR25478302 SRR25478304 SRR25478305 SRR25478306 SRR25478307 SRR25478308 SRR25478309 SRR25478310 SRR25478311 SRR25478259 SRR25478271 SRR25478274 SRR25478303 SRR25478288
            Hide
            Mdavis4290 Molly Davis added a comment -

            [~RobertReid] Hi Dr. Reid, I was wondering why the SRR names are the same as seedling and mature pollen submission IGBF-3347. Is that correct, because then I would only need to rerun the data once for one ticket. Let me know!

            Show
            Mdavis4290 Molly Davis added a comment - [~RobertReid] Hi Dr. Reid, I was wondering why the SRR names are the same as seedling and mature pollen submission IGBF-3347. Is that correct, because then I would only need to rerun the data once for one ticket. Let me know!
            Hide
            robofjoy Robert Reid added a comment -

            For the SRA dataset SRP441343, both sets of data are contained therein. (Muday's and the mature pollen/seedling)

            https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP441343

            NCBi forced me to combine them due to the similarities.
            There are 126 total. 72 of those are Muday's time course.

            Is this what you were looking for?

            Show
            robofjoy Robert Reid added a comment - For the SRA dataset SRP441343, both sets of data are contained therein. (Muday's and the mature pollen/seedling) https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP441343 NCBi forced me to combine them due to the similarities. There are 126 total. 72 of those are Muday's time course. Is this what you were looking for?
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Ok! I have the 72 Muday SRR names but you pasted the same 72 in seedling and mature pollen experiment IGBF-3347 and was wondering what those names are actually supposed to be? Thank you for your help! [~RobertReid]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Ok! I have the 72 Muday SRR names but you pasted the same 72 in seedling and mature pollen experiment IGBF-3347 and was wondering what those names are actually supposed to be? Thank you for your help! [~RobertReid]

              People

              • Assignee:
                robofjoy Robert Reid
                Reporter:
                robofjoy Robert Reid
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated: