[IGBF-3346] Prepping data and entries for SRA for Muday time course - JIRA UNCC

Hide

Permalink

Robert Reid added a comment - 02/Jun/23 8:16 AM

And the adventure begins:

With another biosample sheet for submission.

Working copy can be found here:
https://docs.google.com/spreadsheets/d/1F39JAFfvyct4hdpfV2narT4BH6T13mBgJPvfl5BkbzM/edit?usp=sharing

Still in progress atm.

Show

Robert Reid added a comment - 02/Jun/23 8:16 AM And the adventure begins: With another biosample sheet for submission. Working copy can be found here: https://docs.google.com/spreadsheets/d/1F39JAFfvyct4hdpfV2narT4BH6T13mBgJPvfl5BkbzM/edit?usp=sharing Still in progress atm.

Hide

Permalink

Robert Reid added a comment - 07/Jun/23 3:35 PM

The original untouched data from Azenta is located here:

/projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/00_fastq

I plant to make a working copy of this folder for the purposes of renaming some files.

COPY TIME:
/projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences$ rsync -aP /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/* ./

CHECK MD5 SUMS to ensure copy was correct is next step.

Show

Robert Reid added a comment - 07/Jun/23 3:35 PM The original untouched data from Azenta is located here: /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/00_fastq I plant to make a working copy of this folder for the purposes of renaming some files. COPY TIME: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences$ rsync -aP /projects/tomato_genome/rnaseq/muday144-timeSeries-checkReadMEFIRST/* ./ CHECK MD5 SUMS to ensure copy was correct is next step.

Hide

Permalink

Robert Reid added a comment - 09/Jun/23 9:39 AM

~~COPY ERROR.~~

1 file didn't make it.
So copied that file again.

Rerunning the MD5 checksums again:
WHERE: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/00_fastq

`while read file; do echo $file; md5sum $

{file} >> md5-version2.txt; done < allfiles.txt`
`diff md5.orig md5-version2.txt`

It is now the same. Copy is Successful.

PLaying in SaNDbox
To test the renaming scheme to reflect the errors made by Azenta, I am making a sandbox area. Rather than full sequences, I head each sequence file to grab the first 20 lines in the file and store that in the sandbox.

The sandbox is here:
/projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone

To grab the 20 lines ands make new files in the above location:
`while read file; do echo $file; zcat ../00_fastq/${file}

.gz |head -n 20 > $

{file}; gzip ${file}

; done < files.txt`

(we are unzipping via zcat, grabbing the 20 lines and gzipping again). Success.

Check that each file is 20 lines long.
`for file in *gz; do zcat $file | wc -l; done`

Toy data ready to be played with!

Show

Robert Reid added a comment - 09/Jun/23 9:39 AM COPY ERROR. 1 file didn't make it. So copied that file again. Rerunning the MD5 checksums again: WHERE: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/00_fastq `while read file; do echo $file; md5sum $ {file} >> md5-version2.txt; done < allfiles.txt` `diff md5.orig md5-version2.txt` It is now the same. Copy is Successful. PLaying in SaNDbox To test the renaming scheme to reflect the errors made by Azenta, I am making a sandbox area. Rather than full sequences, I head each sequence file to grab the first 20 lines in the file and store that in the sandbox. The sandbox is here: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone To grab the 20 lines ands make new files in the above location: `while read file; do echo $file; zcat ../00_fastq/${file} .gz |head -n 20 > $ {file}; gzip ${file} ; done < files.txt` (we are unzipping via zcat, grabbing the 20 lines and gzipping again). Success. Check that each file is 20 lines long. `for file in *gz; do zcat $file | wc -l; done` Toy data ready to be played with!

Hide

Permalink

Robert Reid added a comment - 09/Jun/23 12:05 PM

Which files need renaming??

We know samples were mislabeled at Azenta. For a reminder see ticket below:
More details about that are in this ticket: https://jira.bioviz.org/browse/IGBF-3290
From there:

A.34.15.8 is actually F.34.15.8 and vice versa
A.28.30.8 is actually V.28.30.8 and vice versa
A.34.45.8 is actually F.34.45.8 and vice versa
A.28.75.8 is actually V.28.75.8 and vice versa

The Excel sheet that conveys the changes is this one: https://jira.bioviz.org/secure/attachment/17863/Muday%20lab%20RNA%20samples%20for%20sample%20name%20conversion.xls
(or see ticket ~~IGBF-3290~~ above)

So we need to swap using a temp file!

And at the same time let's rename the files too to the 4 code format.
Example:

A.15.28.9 is replicate 9, genotype ARE, 15 minutes time point, and 28 degrees C temperature.

So 9-VF36-75-min-28C_R1_001.fastq.gz would become V.75.28.9_R1.fastq.gz . (NEED TO MAINTAIN THE R1 and R2 pairs!!)

"F.28.15.7" "F.34.15.7" "F.28.30.7" "F.34.30.7"

1. [6] "F.28.45.7" "F.34.45.7" "F.28.75.7" "F.34.75.7" "V.28.15.7"
2. [11] "V.34.15.7" "V.28.30.7" "V.34.30.7" "V.28.45.7" "V.34.45.7"
3. [16] "V.28.75.7" "V.34.75.7" "A.28.15.7" "A.34.15.7" "A.28.30.7"
4. [21] "A.34.30.7" "A.28.45.7" "A.34.45.7" "A.28.75.7" "A.34.75.7"
5. [26] "F.28.15.8" "F.34.15.8" "F.28.30.8" "F.34.30.8" "F.28.45.8"
6. [31] "F.34.45.8" "F.28.75.8" "F.34.75.8" "V.28.15.8" "V.34.15.8"
7. [36] "V.28.30.8" "V.34.30.8" "V.28.45.8" "V.34.45.8" "V.28.75.8"
8. [41] "V.34.75.8" "A.28.15.8" "A.34.15.8" "A.28.30.8" "A.34.30.8"
9. [46] "A.28.45.8" "A.34.45.8" "A.28.75.8" "A.34.75.8" "F.28.15.9"
10. [51] "F.34.15.9" "F.28.30.9" "F.34.30.9" "F.28.45.9" "F.34.45.9"
11. [56] "F.28.75.9" "F.34.75.9" "V.28.15.9" "V.34.15.9" "V.28.30.9"
12. [61] "V.34.30.9" "V.28.45.9" "V.34.45.9" "V.28.75.9" "V.34.75.9"
13. [66] "A.28.15.9" "A.34.15.9" "A.28.30.9" "A.34.30.9" "A.28.45.9"
14. [71] "A.34.45.9" "A.28.75.9" "A.34.75.9"

Show

Robert Reid added a comment - 09/Jun/23 12:05 PM Which files need renaming?? We know samples were mislabeled at Azenta. For a reminder see ticket below: More details about that are in this ticket: https://jira.bioviz.org/browse/IGBF-3290 From there: A.34.15.8 is actually F.34.15.8 and vice versa A.28.30.8 is actually V.28.30.8 and vice versa A.34.45.8 is actually F.34.45.8 and vice versa A.28.75.8 is actually V.28.75.8 and vice versa The Excel sheet that conveys the changes is this one: https://jira.bioviz.org/secure/attachment/17863/Muday%20lab%20RNA%20samples%20for%20sample%20name%20conversion.xls (or see ticket IGBF-3290 above) So we need to swap using a temp file! And at the same time let's rename the files too to the 4 code format. Example: A.15.28.9 is replicate 9, genotype ARE, 15 minutes time point, and 28 degrees C temperature. So 9-VF36-75-min-28C_R1_001.fastq.gz would become V.75.28.9_R1.fastq.gz . (NEED TO MAINTAIN THE R1 and R2 pairs!!) "F.28.15.7" "F.34.15.7" "F.28.30.7" "F.34.30.7" [6] "F.28.45.7" "F.34.45.7" "F.28.75.7" "F.34.75.7" "V.28.15.7" [11] "V.34.15.7" "V.28.30.7" "V.34.30.7" "V.28.45.7" "V.34.45.7" [16] "V.28.75.7" "V.34.75.7" "A.28.15.7" "A.34.15.7" "A.28.30.7" [21] "A.34.30.7" "A.28.45.7" "A.34.45.7" "A.28.75.7" "A.34.75.7" [26] "F.28.15.8" "F.34.15.8" "F.28.30.8" "F.34.30.8" "F.28.45.8" [31] "F.34.45.8" "F.28.75.8" "F.34.75.8" "V.28.15.8" "V.34.15.8" [36] "V.28.30.8" "V.34.30.8" "V.28.45.8" "V.34.45.8" "V.28.75.8" [41] "V.34.75.8" "A.28.15.8" "A.34.15.8" "A.28.30.8" "A.34.30.8" [46] "A.28.45.8" "A.34.45.8" "A.28.75.8" "A.34.75.8" "F.28.15.9" [51] "F.34.15.9" "F.28.30.9" "F.34.30.9" "F.28.45.9" "F.34.45.9" [56] "F.28.75.9" "F.34.75.9" "V.28.15.9" "V.34.15.9" "V.28.30.9" [61] "V.34.30.9" "V.28.45.9" "V.34.45.9" "V.28.75.9" "V.34.75.9" [66] "A.28.15.9" "A.34.15.9" "A.28.30.9" "A.34.30.9" "A.28.45.9" [71] "A.34.45.9" "A.28.75.9" "A.34.75.9"

Hide

Permalink

Ann Loraine added a comment - 09/Jun/23 12:16 PM

Just a quick comment in case this is useful for [~RobertReid]: Muday lab sent us a spreadsheet that documents which samples need to be renamed. There were a lot more than we initially thought.

The spreadsheet is in this folder on Bitbucket: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/

Show

Ann Loraine added a comment - 09/Jun/23 12:16 PM Just a quick comment in case this is useful for [~RobertReid] : Muday lab sent us a spreadsheet that documents which samples need to be renamed. There were a lot more than we initially thought. The spreadsheet is in this folder on Bitbucket: https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/

Hide

Permalink

Robert Reid added a comment - 09/Jun/23 1:08 PM

Ha, yes! I found the sheet and was surprised that 2/3 of the samples were wrong!

I assumed that the red cells were wrong and did a MATCH check to confirm.

Show

Robert Reid added a comment - 09/Jun/23 1:08 PM Ha, yes! I found the sheet and was surprised that 2/3 of the samples were wrong! I assumed that the red cells were wrong and did a MATCH check to confirm.

Hide

Permalink

Robert Reid added a comment - 12/Jun/23 10:07 AM

A useful table generated by Ann during the re-analysis.

https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/results/sample_renaming_summary.txt

Is this:

original new changed
F.28.15.7 F.28.15.7 FALSE
F.34.15.7 F.34.15.7 FALSE
F.28.30.7 F.28.30.7 FALSE
F.34.30.7 F.34.30.7 FALSE
F.28.45.7 F.28.45.7 FALSE
F.34.45.7 F.34.45.7 FALSE
F.28.75.7 F.28.75.7 FALSE
F.34.75.7 F.34.75.7 FALSE
V.28.15.7 V.28.15.7 FALSE
V.34.15.7 V.34.15.7 FALSE
V.28.30.7 V.28.30.7 FALSE
V.34.30.7 V.34.30.7 FALSE
V.28.45.7 V.28.45.7 FALSE
V.34.45.7 V.34.45.7 FALSE
V.28.75.7 V.28.75.7 FALSE
V.34.75.7 V.34.75.7 FALSE
A.28.15.7 A.28.15.7 FALSE
A.34.15.7 A.34.15.7 FALSE
A.28.30.7 A.28.30.7 FALSE
A.34.30.7 A.34.30.7 FALSE
A.28.45.7 A.28.45.7 FALSE
A.34.45.7 A.34.45.7 FALSE
A.28.75.7 A.28.75.7 FALSE
A.34.75.7 A.34.75.7 FALSE
F.28.15.8 V.34.15.8 TRUE
F.34.15.8 A.34.30.8 TRUE
F.28.30.8 F.34.30.8 TRUE
F.34.30.8 V.34.30.8 TRUE
F.28.45.8 F.28.75.8 TRUE
F.34.45.8 A.28.45.8 TRUE
F.28.75.8 V.28.45.8 TRUE
F.34.75.8 F.28.45.8 TRUE
V.28.15.8 F.28.15.8 TRUE
V.34.15.8 V.28.15.8 TRUE
V.28.30.8 A.28.15.8 TRUE
V.34.30.8 F.28.30.8 TRUE
V.28.45.8 V.34.75.8 TRUE
V.34.45.8 F.34.75.8 TRUE
V.28.75.8 A.34.75.8 TRUE
V.34.75.8 V.34.45.8 TRUE
A.28.15.8 A.28.30.8 TRUE
A.34.15.8 F.34.15.8 TRUE
A.28.30.8 V.28.30.8 TRUE
A.34.30.8 A.34.15.8 TRUE
A.28.45.8 A.34.45.8 TRUE
A.34.45.8 V.28.75.8 TRUE
A.28.75.8 F.34.45.8 TRUE
A.34.75.8 A.28.75.8 TRUE
F.28.15.9 V.34.30.9 TRUE
F.34.15.9 V.28.30.9 TRUE
F.28.30.9 V.34.15.9 TRUE
F.34.30.9 V.28.15.9 TRUE
F.28.45.9 V.34.75.9 TRUE
F.34.45.9 V.28.75.9 TRUE
F.28.75.9 V.34.45.9 TRUE
F.34.75.9 V.28.45.9 TRUE
V.28.15.9 F.34.30.9 TRUE
V.34.15.9 F.28.30.9 TRUE
V.28.30.9 F.34.15.9 TRUE
V.34.30.9 F.28.15.9 TRUE
V.28.45.9 F.34.75.9 TRUE
V.34.45.9 F.28.75.9 TRUE
V.28.75.9 F.34.45.9 TRUE
V.34.75.9 F.28.45.9 TRUE
A.28.15.9 A.34.30.9 TRUE
A.34.15.9 A.28.30.9 TRUE
A.28.30.9 A.34.15.9 TRUE
A.34.30.9 A.28.15.9 TRUE
A.28.45.9 A.34.75.9 TRUE
A.34.45.9 A.28.75.9 TRUE
A.28.75.9 A.34.45.9 TRUE
A.34.75.9 A.28.45.9 TRUE

Show

Robert Reid added a comment - 12/Jun/23 10:07 AM A useful table generated by Ann during the re-analysis. https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/results/sample_renaming_summary.txt Is this: original new changed F.28.15.7 F.28.15.7 FALSE F.34.15.7 F.34.15.7 FALSE F.28.30.7 F.28.30.7 FALSE F.34.30.7 F.34.30.7 FALSE F.28.45.7 F.28.45.7 FALSE F.34.45.7 F.34.45.7 FALSE F.28.75.7 F.28.75.7 FALSE F.34.75.7 F.34.75.7 FALSE V.28.15.7 V.28.15.7 FALSE V.34.15.7 V.34.15.7 FALSE V.28.30.7 V.28.30.7 FALSE V.34.30.7 V.34.30.7 FALSE V.28.45.7 V.28.45.7 FALSE V.34.45.7 V.34.45.7 FALSE V.28.75.7 V.28.75.7 FALSE V.34.75.7 V.34.75.7 FALSE A.28.15.7 A.28.15.7 FALSE A.34.15.7 A.34.15.7 FALSE A.28.30.7 A.28.30.7 FALSE A.34.30.7 A.34.30.7 FALSE A.28.45.7 A.28.45.7 FALSE A.34.45.7 A.34.45.7 FALSE A.28.75.7 A.28.75.7 FALSE A.34.75.7 A.34.75.7 FALSE F.28.15.8 V.34.15.8 TRUE F.34.15.8 A.34.30.8 TRUE F.28.30.8 F.34.30.8 TRUE F.34.30.8 V.34.30.8 TRUE F.28.45.8 F.28.75.8 TRUE F.34.45.8 A.28.45.8 TRUE F.28.75.8 V.28.45.8 TRUE F.34.75.8 F.28.45.8 TRUE V.28.15.8 F.28.15.8 TRUE V.34.15.8 V.28.15.8 TRUE V.28.30.8 A.28.15.8 TRUE V.34.30.8 F.28.30.8 TRUE V.28.45.8 V.34.75.8 TRUE V.34.45.8 F.34.75.8 TRUE V.28.75.8 A.34.75.8 TRUE V.34.75.8 V.34.45.8 TRUE A.28.15.8 A.28.30.8 TRUE A.34.15.8 F.34.15.8 TRUE A.28.30.8 V.28.30.8 TRUE A.34.30.8 A.34.15.8 TRUE A.28.45.8 A.34.45.8 TRUE A.34.45.8 V.28.75.8 TRUE A.28.75.8 F.34.45.8 TRUE A.34.75.8 A.28.75.8 TRUE F.28.15.9 V.34.30.9 TRUE F.34.15.9 V.28.30.9 TRUE F.28.30.9 V.34.15.9 TRUE F.34.30.9 V.28.15.9 TRUE F.28.45.9 V.34.75.9 TRUE F.34.45.9 V.28.75.9 TRUE F.28.75.9 V.34.45.9 TRUE F.34.75.9 V.28.45.9 TRUE V.28.15.9 F.34.30.9 TRUE V.34.15.9 F.28.30.9 TRUE V.28.30.9 F.34.15.9 TRUE V.34.30.9 F.28.15.9 TRUE V.28.45.9 F.34.75.9 TRUE V.34.45.9 F.28.75.9 TRUE V.28.75.9 F.34.45.9 TRUE V.34.75.9 F.28.45.9 TRUE A.28.15.9 A.34.30.9 TRUE A.34.15.9 A.28.30.9 TRUE A.28.30.9 A.34.15.9 TRUE A.34.30.9 A.28.15.9 TRUE A.28.45.9 A.34.75.9 TRUE A.34.45.9 A.28.75.9 TRUE A.28.75.9 A.34.45.9 TRUE A.34.75.9 A.28.45.9 TRUE

Hide

Permalink

Robert Reid added a comment - 13/Jun/23 9:43 AM - edited

A simple copy script exists!

It converts each sequence file into the coded single letter format.
It renames the files to match the Excel table above.

Testing area is on HPC here:

/projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone

About to need a review and test by another.........

Show

Robert Reid added a comment - 13/Jun/23 9:43 AM - edited A simple copy script exists! It converts each sequence file into the coded single letter format. It renames the files to match the Excel table above. Testing area is on HPC here: /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone About to need a review and test by another.........

Hide

Permalink

Robert Reid added a comment - 13/Jun/23 10:14 AM

To be reviewed:

User needs to log into the HPC cluster.
Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone
Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/)
Run the following command
bash renameANDrelabel.bash
Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)

Show

Robert Reid added a comment - 13/Jun/23 10:14 AM To be reviewed: User needs to log into the HPC cluster. Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/ ) Run the following command bash renameANDrelabel.bash Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)

Hide

Permalink

Robert Reid added a comment - 13/Jun/23 12:19 PM

NCBI Submission ID: SUB13519532

Show

Robert Reid added a comment - 13/Jun/23 12:19 PM NCBI Submission ID: SUB13519532

Hide

Permalink

Robert Reid added a comment - 14/Jun/23 9:56 AM

2 more tables to review as well !!

This is the Biosample and the SRA table.

They are located in the Google drive at this location:

https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link

Like previous tables, need to check everything aligns and is correct.

We can't truly confirm the SRA table until the renaming step is reviewed and signed off on.

Show

Robert Reid added a comment - 14/Jun/23 9:56 AM 2 more tables to review as well !! This is the Biosample and the SRA table. They are located in the Google drive at this location: https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link Like previous tables, need to check everything aligns and is correct. We can't truly confirm the SRA table until the renaming step is reviewed and signed off on.

Hide

Permalink

Robert Reid added a comment - 14/Jun/23 2:38 PM

Need a design Description for SRA sheet.
The Azenta summary is this:

The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence
data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed
using Illumina's bcl2fastq 2.17 software. One mismatch was allowed for index sequence
identification.

Show

Robert Reid added a comment - 14/Jun/23 2:38 PM Need a design Description for SRA sheet. The Azenta summary is this: The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software. One mismatch was allowed for index sequence identification.

Hide

Permalink

Nowlan Freese added a comment - 27/Jun/23 3:03 PM

Reviewing:

/testingRenameZone

I checked line by line the script renameANDrelabel.bash compared to the Muday-lab-RNA-samples-for-sample-name-conversion.xlsx file. Everything matched correctly. Files that were to be renamed matched the corrected name, and the conversion to the 4 code format was correct.
I was not able to run the script as I did not have permission, but I don't know that it matters since I have reviewed the script itself.

SRA_metadata_Muday144.xlsx

I compared the sample_name, library_ID, and title and they all matched.
I compared the filename to the sample_name. The filenames appear to be using the mislabeled names, but with the updated 4 code format. I'm not sure what the expectation here is, but these may need to be double-checked.
There is only a single design_description that is split across multiple lines (I assume this is a placeholder).
- Oligod(T) -> Oligo d(T)

_SRABiosampleForm-muday144.xlsx

Compared sample_name, sample_title, cultivar, temp, treatment, description, and Replicate Code. Everything appears to match.

Show

Nowlan Freese added a comment - 27/Jun/23 3:03 PM Reviewing: /testingRenameZone I checked line by line the script renameANDrelabel.bash compared to the Muday-lab-RNA-samples-for-sample-name-conversion.xlsx file. Everything matched correctly. Files that were to be renamed matched the corrected name, and the conversion to the 4 code format was correct. I was not able to run the script as I did not have permission, but I don't know that it matters since I have reviewed the script itself. SRA_metadata_Muday144.xlsx I compared the sample_name, library_ID, and title and they all matched. I compared the filename to the sample_name. The filenames appear to be using the mislabeled names, but with the updated 4 code format. I'm not sure what the expectation here is, but these may need to be double-checked. There is only a single design_description that is split across multiple lines (I assume this is a placeholder). Oligod(T) -> Oligo d(T) _SRABiosampleForm-muday144.xlsx Compared sample_name, sample_title, cultivar, temp, treatment, description, and Replicate Code. Everything appears to match.

Hide

Permalink

Ann Loraine added a comment - 28/Jun/23 10:03 AM - edited

Nowlan Freese: suggests Molly also look at it and try to run the script in the "sandbox" space with smaller versions of the files.

Show

Ann Loraine added a comment - 28/Jun/23 10:03 AM - edited Nowlan Freese : suggests Molly also look at it and try to run the script in the "sandbox" space with smaller versions of the files.

Hide

Permalink

Robert Reid added a comment - 30/Jun/23 9:35 AM

The Design Description was messed up as Nowlan pointed out. Due to return carriages in the text, creating havoc when pasting into an Excel sheet.

Reworked the description to be just one line below. This has now been updated on the SRA excel sheet and pasted in properly.

design description:

The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software.

Show

Robert Reid added a comment - 30/Jun/23 9:35 AM The Design Description was messed up as Nowlan pointed out. Due to return carriages in the text, creating havoc when pasting into an Excel sheet. Reworked the description to be just one line below. This has now been updated on the SRA excel sheet and pasted in properly. design description: The RNA sample received was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and RNA integrity was checked using TapeStation (Agilent Technologies, Palo Alto, CA, USA). The RNA sequencing library was prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina using manufacturer’s instructions (NEB, Ipswich, MA, USA). mRNAs were initially enriched with Oligod(T) beads. Enriched mRNAs were fragmented for 15 minutes at 94 °C. First strand and second strand cDNA were subsequently synthesized. cDNA fragments were end repaired and adenylated at 3’ends, and universal adapters were ligated to cDNA fragments, followed by index addition and library enrichment by PCR with limited cycles. The sequencing library was validated on the Agilent TapeStation (Agilent Technologies, Palo Alto, CA, USA), and quantified by using Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA) as well as by quantitative PCR (KAPA Biosystems, Wilmington, MA, USA). The sequencing library was clustered on one lane of a flowcell. After clustering, the flowcell was loaded on the Illumina HiSeq instrument (4000) according to manufacturer’s instructions. The sample was sequenced using a 2x150bp Paired End (PE) configuration. Image analysis and base calling were conducted by the HiSeq Control Software (HCS). Raw sequence data (.bcl files) generated from Illumina HiSeq was converted into fastq files and de-multiplexed using Illumina's bcl2fastq 2.17 software.

Hide

Permalink

Robert Reid added a comment - 30/Jun/23 9:41 AM

Corrected the description to no longer refer to Oligods !!!! ( Is now Oligo D(T) )

I do like the term Oligod though, sounds like a deity we can pray to get to get sequencing projects to run smoothly.

Show

Robert Reid added a comment - 30/Jun/23 9:41 AM Corrected the description to no longer refer to Oligods !!!! ( Is now Oligo D(T) ) I do like the term Oligod though, sounds like a deity we can pray to get to get sequencing projects to run smoothly.

Hide

Permalink

Robert Reid added a comment - 30/Jun/23 9:44 AM

Round 2 of review: This time Molly.

2 parts to this review:

#1
To be reviewed:

User needs to log into the HPC cluster.
Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone
Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/)
~~Run the following command~~ (just check the script for logic)
bash renameANDrelabel.bash
Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l)

#2 Checking the Biosample and SRA sheets:

This is the Biosample and the SRA table.

They are located in the Google drive at this location:

https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link

Like previous tables, need to check everything aligns and is correct.

Show

Robert Reid added a comment - 30/Jun/23 9:44 AM Round 2 of review: This time Molly. 2 parts to this review: #1 To be reviewed: User needs to log into the HPC cluster. Navigate to /projects/tomato_genome/rnaseq/renamed_MudayTimeCourseSequences/testingRenameZone Spot check the script to see that each line is picking the correct NEW sample name. To do that, one will need to look at the Excel sheet mentioned above (in https://bitbucket.org/hotpollen/flavonoid-rnaseq/src/main/72_F3H_PollenTube/Documentation/ ) Run the following command (just check the script for logic) bash renameANDrelabel.bash Check that there are 144 new Fastq.gz files ! And then check that each file is 20 lines long. (zcat file | wc -l) #2 Checking the Biosample and SRA sheets: This is the Biosample and the SRA table. They are located in the Google drive at this location: https://drive.google.com/drive/folders/1EaCt42IuxWd--1kKZW931PWw9N5OWpw4?usp=drive_link Like previous tables, need to check everything aligns and is correct.

Hide

Permalink

Robert Reid added a comment - 25/Jul/23 9:47 AM

I need to walk through this all again. As a final check.
And then go through the NCBI submission portal and ensure all of those pieces are intact and correct.

The FTP site data, I need to look and see that it is ready to go as well.

But once that is checked, I think we are ready to submit.

Show

Robert Reid added a comment - 25/Jul/23 9:47 AM I need to walk through this all again. As a final check. And then go through the NCBI submission portal and ensure all of those pieces are intact and correct. The FTP site data, I need to look and see that it is ready to go as well. But once that is checked, I think we are ready to submit.

Hide

Permalink

Ann Loraine added a comment - 25/Jul/23 10:18 AM

Nowlan Freese: says check file name concordance

Show

Ann Loraine added a comment - 25/Jul/23 10:18 AM Nowlan Freese : says check file name concordance

Hide

Permalink

Robert Reid added a comment - 27/Jul/23 1:24 PM

Sorting folly on my part!

I have corrected the file names so that they are now in line with the ID and names in the SRA tables.
Great catch Nowlan!

When I listed and copied the file name, they sorted by date generated (ls -lrt) and not listed by name, which alters the list order.

The files all line up correctly now in the table.
It would be good for Nowlan to double check this however!!
I will bump to review and assign.

File to review is this one:
https://docs.google.com/spreadsheets/d/1n4nsE4E8lykivizPtQyf17XJnL7FRELR/edit?usp=sharing&ouid=100714234126361751017&rtpof=true&sd=true

Also, FTP environment for NCBI has ben updated in June 2023.
Need to upload Muday Data again. Process for that has begun. 72 files corresponding to google sheet above.
R

Show

Robert Reid added a comment - 27/Jul/23 1:24 PM Sorting folly on my part! I have corrected the file names so that they are now in line with the ID and names in the SRA tables. Great catch Nowlan! When I listed and copied the file name, they sorted by date generated (ls -lrt) and not listed by name, which alters the list order. The files all line up correctly now in the table. It would be good for Nowlan to double check this however!! I will bump to review and assign. File to review is this one: https://docs.google.com/spreadsheets/d/1n4nsE4E8lykivizPtQyf17XJnL7FRELR/edit?usp=sharing&ouid=100714234126361751017&rtpof=true&sd=true Also, FTP environment for NCBI has ben updated in June 2023. Need to upload Muday Data again. Process for that has begun. 72 files corresponding to google sheet above. R

Hide

Permalink

Robert Reid added a comment - 27/Jul/23 3:05 PM - edited

This is an FTp note for Rob:

/uploads/rreid2_uncc.edu_eIUyy48y/muday144

It's the location where this data is going short term at NCBI.

72 pairs of files. 144 total sequence files.

142 successfully transferred. 2 did not, server disconnection. Resending the final 2.

Show

Robert Reid added a comment - 27/Jul/23 3:05 PM - edited This is an FTp note for Rob: /uploads/rreid2_uncc.edu_eIUyy48y/muday144 It's the location where this data is going short term at NCBI. 72 pairs of files. 144 total sequence files. 142 successfully transferred. 2 did not, server disconnection. Resending the final 2.

Hide

Permalink

Nowlan Freese added a comment - 28/Jul/23 11:11 AM

Testing SRA metadata:

[~RobertReid] - I think the lines 50-57 and 58-65 for the file names are mismatched/swapped.

Show

Nowlan Freese added a comment - 28/Jul/23 11:11 AM Testing SRA metadata: [~RobertReid] - I think the lines 50-57 and 58-65 for the file names are mismatched/swapped.

Hide

Permalink

Robert Reid added a comment - 31/Jul/23 1:40 PM

Corrected these file names in the table.

Ready for another Nowlan inspection.

Show

Robert Reid added a comment - 31/Jul/23 1:40 PM Corrected these file names in the table. Ready for another Nowlan inspection.

Hide

Permalink

Nowlan Freese added a comment - 31/Jul/23 2:24 PM

SRA metadata table looks good, I couldn't find any issues.

Show

Nowlan Freese added a comment - 31/Jul/23 2:24 PM SRA metadata table looks good, I couldn't find any issues.

Hide

Permalink

Robert Reid added a comment - 01/Aug/23 10:39 AM

Dear Robert Reid,

Your submission SUB13519532 has failed with the following error:

1. Similar projects already exist: PRJNA980666

That is a new complaint!!!!
I think I will try and change the title and description to highlight that these are varieties related to flavanoid production.
I change title and the desc.
After that, I will need to reach out and chat with the SRA people.

Show

Robert Reid added a comment - 01/Aug/23 10:39 AM Dear Robert Reid, Your submission SUB13519532 has failed with the following error: 1. Similar projects already exist: PRJNA980666 That is a new complaint!!!! I think I will try and change the title and description to highlight that these are varieties related to flavanoid production. I change title and the desc. After that, I will need to reach out and chat with the SRA people.

Hide

Permalink

Robert Reid added a comment - 01/Aug/23 10:43 AM

This produces an out right rejection, with no ability to edit.

So I will reach out to the SRA people and provide some changes that hopefully makes them happy.
Or I will have them add this SRA under the BioProject of Mark's recent submission. Maybe that will appease them!

Show

Robert Reid added a comment - 01/Aug/23 10:43 AM This produces an out right rejection, with no ability to edit. So I will reach out to the SRA people and provide some changes that hopefully makes them happy. Or I will have them add this SRA under the BioProject of Mark's recent submission. Maybe that will appease them!

Hide

Permalink

Robert Reid added a comment - 01/Aug/23 11:53 AM

I have reached out the the SRA withthe following email:

Hi I made this submission and it has been rejected due to the Bioproject being too similar to
PRJNA980666.

Is it possible to add this to the previous BioProject? The data came from 2 different labs (Brown University versus Wake Forest) and are completely different tomato varieties.
But they are all the same species and are a similar time course experiment.
And they are all the same tissue, pollen tube.

Rob Reid

Show

Robert Reid added a comment - 01/Aug/23 11:53 AM I have reached out the the SRA withthe following email: Hi I made this submission and it has been rejected due to the Bioproject being too similar to PRJNA980666. Is it possible to add this to the previous BioProject? The data came from 2 different labs (Brown University versus Wake Forest) and are completely different tomato varieties. But they are all the same species and are a similar time course experiment. And they are all the same tissue, pollen tube. Rob Reid

Hide

Permalink

Robert Reid added a comment - 01/Aug/23 3:51 PM

The SRA have responded favorably!!

SRA submission SUB13519532 is now re-processing with BioProject PRJNA980666.
Best,
Rick Lapoint
SRA Curator

Show

Robert Reid added a comment - 01/Aug/23 3:51 PM The SRA have responded favorably!! SRA submission SUB13519532 is now re-processing with BioProject PRJNA980666. Best, Rick Lapoint SRA Curator

Hide

Permalink

Robert Reid added a comment - 02/Aug/23 10:25 AM

SUCCESS !!

Dear Robert Reid,

This is an automatic acknowledgment that your recent submission to the SRA database has been successfully processed and will be released on the date specified.

Please reference PRJNA980666 in your publication. This BioProject accession number is provided instead of SRP and should be used in your publication as it will allow better searching in Entrez.

Accession to cite for these SRA data: PRJNA980666
Temporary Submission ID: SUB13519532
Release date: 2023-08-02

Your SRA records will be accessible with the following link after the indicated release date:
https://www.ncbi.nlm.nih.gov/sra/PRJNA980666

Send questions and update requests to sra@ncbi.nlm.nih.gov; include the citation accession PRJNA980666 in any correspondence.

Regards,

NCBI SRA Submissions Staff
Bethesda, Maryland USA

Show

Robert Reid added a comment - 02/Aug/23 10:25 AM SUCCESS !! Dear Robert Reid, This is an automatic acknowledgment that your recent submission to the SRA database has been successfully processed and will be released on the date specified. Please reference PRJNA980666 in your publication. This BioProject accession number is provided instead of SRP and should be used in your publication as it will allow better searching in Entrez. Accession to cite for these SRA data: PRJNA980666 Temporary Submission ID: SUB13519532 Release date: 2023-08-02 Your SRA records will be accessible with the following link after the indicated release date: https://www.ncbi.nlm.nih.gov/sra/PRJNA980666 Send questions and update requests to sra@ncbi.nlm.nih.gov; include the citation accession PRJNA980666 in any correspondence. Regards, NCBI SRA Submissions Staff Bethesda, Maryland USA

Hide

Permalink

Robert Reid added a comment - 02/Aug/23 10:26 AM

We need to add all of the Accession IDs before we close this out!

Stay tuned.

Show

Robert Reid added a comment - 02/Aug/23 10:26 AM We need to add all of the Accession IDs before we close this out! Stay tuned.

Hide

Permalink

Robert Reid added a comment - 07/Aug/23 1:08 PM - edited

SRR Accesions:
SRR25478240
SRR25478241
SRR25478242
SRR25478243
SRR25478244
SRR25478245
SRR25478246
SRR25478247
SRR25478248
SRR25478249
SRR25478250
SRR25478251
SRR25478252
SRR25478253
SRR25478254
SRR25478255
SRR25478256
SRR25478257
SRR25478258
SRR25478260
SRR25478261
SRR25478262
SRR25478263
SRR25478264
SRR25478265
SRR25478266
SRR25478267
SRR25478268
SRR25478269
SRR25478270
SRR25478272
SRR25478273
SRR25478275
SRR25478276
SRR25478277
SRR25478278
SRR25478279
SRR25478280
SRR25478281
SRR25478282
SRR25478283
SRR25478284
SRR25478285
SRR25478286
SRR25478287
SRR25478289
SRR25478290
SRR25478291
SRR25478292
SRR25478293
SRR25478294
SRR25478295
SRR25478296
SRR25478297
SRR25478298
SRR25478299
SRR25478300
SRR25478301
SRR25478302
SRR25478304
SRR25478305
SRR25478306
SRR25478307
SRR25478308
SRR25478309
SRR25478310
SRR25478311
SRR25478259
SRR25478271
SRR25478274
SRR25478303
SRR25478288

Show

Robert Reid added a comment - 07/Aug/23 1:08 PM - edited SRR Accesions: SRR25478240 SRR25478241 SRR25478242 SRR25478243 SRR25478244 SRR25478245 SRR25478246 SRR25478247 SRR25478248 SRR25478249 SRR25478250 SRR25478251 SRR25478252 SRR25478253 SRR25478254 SRR25478255 SRR25478256 SRR25478257 SRR25478258 SRR25478260 SRR25478261 SRR25478262 SRR25478263 SRR25478264 SRR25478265 SRR25478266 SRR25478267 SRR25478268 SRR25478269 SRR25478270 SRR25478272 SRR25478273 SRR25478275 SRR25478276 SRR25478277 SRR25478278 SRR25478279 SRR25478280 SRR25478281 SRR25478282 SRR25478283 SRR25478284 SRR25478285 SRR25478286 SRR25478287 SRR25478289 SRR25478290 SRR25478291 SRR25478292 SRR25478293 SRR25478294 SRR25478295 SRR25478296 SRR25478297 SRR25478298 SRR25478299 SRR25478300 SRR25478301 SRR25478302 SRR25478304 SRR25478305 SRR25478306 SRR25478307 SRR25478308 SRR25478309 SRR25478310 SRR25478311 SRR25478259 SRR25478271 SRR25478274 SRR25478303 SRR25478288

Hide

Permalink

Molly Davis added a comment - 17/Aug/23 10:27 AM

[~RobertReid] Hi Dr. Reid, I was wondering why the SRR names are the same as seedling and mature pollen submission IGBF-3347. Is that correct, because then I would only need to rerun the data once for one ticket. Let me know!

Show

Molly Davis added a comment - 17/Aug/23 10:27 AM [~RobertReid] Hi Dr. Reid, I was wondering why the SRR names are the same as seedling and mature pollen submission IGBF-3347. Is that correct, because then I would only need to rerun the data once for one ticket. Let me know!

Hide

Permalink

Robert Reid added a comment - 17/Aug/23 10:51 AM

For the SRA dataset SRP441343, both sets of data are contained therein. (Muday's and the mature pollen/seedling)

https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP441343

NCBi forced me to combine them due to the similarities.
There are 126 total. 72 of those are Muday's time course.

Is this what you were looking for?

Show

Robert Reid added a comment - 17/Aug/23 10:51 AM For the SRA dataset SRP441343, both sets of data are contained therein. (Muday's and the mature pollen/seedling) https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP441343 NCBi forced me to combine them due to the similarities. There are 126 total. 72 of those are Muday's time course. Is this what you were looking for?

Hide

Permalink

Molly Davis added a comment - 18/Aug/23 11:23 AM - edited

Ok! I have the 72 Muday SRR names but you pasted the same 72 in seedling and mature pollen experiment IGBF-3347 and was wondering what those names are actually supposed to be? Thank you for your help! [~RobertReid]

Show

Molly Davis added a comment - 18/Aug/23 11:23 AM - edited Ok! I have the 72 Muday SRR names but you pasted the same 72 in seedling and mature pollen experiment IGBF-3347 and was wondering what those names are actually supposed to be? Thank you for your help! [~RobertReid]

Prepping data and entries for SRA for Muday time course

Details

Description

Attachments

Issue Links

Activity

People

Dates