Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3420

Process 30-804059537 (KP 2023) data using S_lycopersicum_Sep_2019 (SL4) genome assembly and annotations

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      My apologies if this has already been done and I lost track of the data somehow!

      For this task, we need to align and deploy data from Azenta run 30-804059537 onto the SL4 / S_lycopersicum_Sep_2019 genome, using the S_lycopersicum_Sep_2019 assembly and accompanying gene annotations.

      To-do:

      • run the nextflow pipeline as per usual. Please use location /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019 on the UNCC cluster
      • create a branch in the code repo as usual
      • commit to the branch the following files:

      30-804059537_SL4_multiqc_report.html (renamed from multiqc_report.html)
      30-804059537_SL4_salmon.merged.gene_counts.tsv (renamed from salmon.merged.gene_counts.tsv)

      • create coverage graphs and junction files, as per usual
      • move just the coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 (We'll use this as the source directory for an rsync command that will copy these data over the data.bioviz.org host for visualization in IGB)

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directory: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019

            ./doIt.sh Kelsey_samples_2023.csv /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.fa /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.gtf /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.bed tomato.config 1> out.0.txt 2> err.0.txt
            
            Show
            Mdavis4290 Molly Davis added a comment - - edited Directory: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019 ./doIt.sh Kelsey_samples_2023.csv /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.fa /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.gtf /projects/tomato_genome/fnb/nfcore_rnaseq/S_lycopersicum_Sep_2019.bed tomato.config 1> out.0.txt 2> err.0.txt
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Reviewer:

            Check that files have reasonable sizes (no "zero" size files, for example)
            Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file
            Check that every bam file has a corresponding "FJ.bed.gz" file
            Check that every bam file has a corresponding "scaled.bedgraph.gz" file
            Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi"

            Reviewer: [~RobertReid]

            Next Steps after review:

            • move just the coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537
            • Make sure permissions are granted for copying
            Show
            Mdavis4290 Molly Davis added a comment - - edited Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err Reviewer : Check that files have reasonable sizes (no "zero" size files, for example) Check that every "FJ.bed.gz" file has a corresponding "FJ.bed.gz.tbi" index file Check that every bam file has a corresponding "FJ.bed.gz" file Check that every bam file has a corresponding "scaled.bedgraph.gz" file Check that every "scaled.bedgraph.gz" has a corresponding "scaled.bedgraph.gz.tbi" Reviewer: [~RobertReid] Next Steps after review : move just the coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 Make sure permissions are granted for copying
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Branch: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420
            Includes 30-804059537_SL4_multiqc_report.html & 30-804059537_SL4_salmon.merged.gene_counts.tsv

            Reviewer: [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420 Includes 30-804059537_SL4_multiqc_report.html & 30-804059537_SL4_salmon.merged.gene_counts.tsv Reviewer: [~aloraine]
            Hide
            robofjoy Robert Reid added a comment -

            Checking out files in :
            /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019/results/star_salmon

            .err files
            All files here are 0 in size.
            Except one!

            rw-rr- 1 mdavi258 tomato_genome 370 Aug 28 13:56 Malintka-R1-0hr-25C-self.err

            The error is

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)
            Malintka-R1-0hr-25C-self.err lines 1-4/4 (END)

            Bam and bai
            63 bam files
            63 bai files

            rw-r---- 1 mdavi258 tomato_genome 590M Aug 28 11:58 Malintka-R1-8hr-25C-self.bam
            Sue enough this file is 590M while the rest range from 950M to 3GB. Broken!

            Show
            robofjoy Robert Reid added a comment - Checking out files in : /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019/results/star_salmon .err files All files here are 0 in size. Except one! rw-r r - 1 mdavi258 tomato_genome 370 Aug 28 13:56 Malintka-R1-0hr-25C-self.err The error is Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Malintka-R1-0hr-25C-self.err lines 1-4/4 (END) Bam and bai 63 bam files 63 bai files rw-r ---- 1 mdavi258 tomato_genome 590M Aug 28 11:58 Malintka-R1-8hr-25C-self.bam Sue enough this file is 590M while the rest range from 950M to 3GB. Broken!
            Hide
            ann.loraine Ann Loraine added a comment -

            Moving this back to "to-do" as the first level review found an issue. MD will look into it, as per her comment during today's morning meeting. Please feel free to move this somewhere different as you see fit, of course.

            attn: [~molly]

            Show
            ann.loraine Ann Loraine added a comment - Moving this back to "to-do" as the first level review found an issue. MD will look into it, as per her comment during today's morning meeting. Please feel free to move this somewhere different as you see fit, of course. attn: [~molly]
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Nextflow Pipeline ran successfully with SL4 genome
            Directory: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019
            MultiQC report notes: No errors or warnings were present in the report. The output file is named '30-804059537_SL4_multiqc_report.html'.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Nextflow Pipeline ran successfully with SL4 genome Directory: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019 MultiQC report notes: No errors or warnings were present in the report. The output file is named '30-804059537_SL4_multiqc_report.html'.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Next steps:
            Commit CSV and multiqc report to Pistil repo on bitbucket
            Change sorted bam names
            Create junction files
            Create Coverage graphs

            Show
            Mdavis4290 Molly Davis added a comment - Next steps: Commit CSV and multiqc report to Pistil repo on bitbucket Change sorted bam names Create junction files Create Coverage graphs
            Hide
            Mdavis4290 Molly Davis added a comment -

            Launch renameBams.sh script:
            ./renameBams.sh
            Launch Scaled Coverage graphs script:
            ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err
            Launch Junction files script:
            ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err

            Show
            Mdavis4290 Molly Davis added a comment - Launch renameBams.sh script : ./renameBams.sh Launch Scaled Coverage graphs script : ./sbatch-doIt.sh .bam bamCoverage.sh >jobs.out 2>jobs.err Launch Junction files script : ./sbatch-doIt.sh .bam find_junctions.sh >jobs.out 2>jobs.err
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Directory: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019/results/star_salmon

            Still running into same error from before when running find_junctions.sh script: Malintka-R1-0hr-25C-self.err

            Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191)
            at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332)
            at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249)

            Next Step: Ask Nowlan Freese for help with java jar file (find-junctions-1.0.0-jar-with-dependencies.jar)(S_lycopersicum_Sep_2019.2bit).

            Show
            Mdavis4290 Molly Davis added a comment - - edited Directory : /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019/results/star_salmon Still running into same error from before when running find_junctions.sh script: Malintka-R1-0hr-25C-self.err Exception in thread "main" java.lang.RuntimeException: Postion is too high (more than 64792705) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.setCurrentSequencePosition(TwoBitParser.java:191) at org.biojava.nbio.genome.parsers.twobit.TwoBitParser.loadFragment(TwoBitParser.java:332) at org.lorainelab.findjunctions.FindJunctions.main(FindJunctions.java:249) Next Step: Ask Nowlan Freese for help with java jar file (find-junctions-1.0.0-jar-with-dependencies.jar)(S_lycopersicum_Sep_2019.2bit).
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Meeting notes:
            Issue: Looked at the bam file which looks fine but there are reads that expand upon the chromosome. They have been soft clipped but find_junctions isn't taking that into consideration and won't process the file entirely to create the .FJ.bed.gz file.

            Extra notes: There is a read 64792,798 of the actual chromosome that is past the end of the SL4 genome. The reader should have clipped the read but didn't do this and now is not fitting correctly at the ends. The alignment didn't clip the ends to match the SL4 genome correctly just for this one sample Malintka-R1-0hr-25C-self. Junctions confused because what was soft clipped did pass but find_junctions for this bam file is unusable. Find_junctions isn't obeying that the files have been soft clipped. Aligner should hard-clip but instead is soft-clipping.

            Code used:

            SL4.0ch10:64782127-64791136 > samtoolsOutput.bam
            java -jar find-junctions-1.0.0-jar-with-dependencies.jar -u -f 5 -b S_lycopersicum_Sep_2019.2bit -o myShortFileoutput.bed samtoolsOutput.bamsamtools view -b Malintka-R1-0hr-25C-self.bam 
            

            Possible solution: Change code to find_junctions to allow soft-clipping if over the distance required. But this is for only one file so would it be worth changing the code?

            Could also review the multiqc report to maybe discover clipping issue: Per Base Sequence Content has failed and usually means there are contaminates in the library. Aligners should be able to deal with bases that don't align and soft-clip them. But in this case for this one sample it did not do this correctly?

            Show
            Mdavis4290 Molly Davis added a comment - - edited Meeting notes : Issue : Looked at the bam file which looks fine but there are reads that expand upon the chromosome. They have been soft clipped but find_junctions isn't taking that into consideration and won't process the file entirely to create the .FJ.bed.gz file. Extra notes : There is a read 64792,798 of the actual chromosome that is past the end of the SL4 genome. The reader should have clipped the read but didn't do this and now is not fitting correctly at the ends. The alignment didn't clip the ends to match the SL4 genome correctly just for this one sample Malintka-R1-0hr-25C-self. Junctions confused because what was soft clipped did pass but find_junctions for this bam file is unusable. Find_junctions isn't obeying that the files have been soft clipped. Aligner should hard-clip but instead is soft-clipping. Code used: SL4.0ch10:64782127-64791136 > samtoolsOutput.bam java -jar find-junctions-1.0.0-jar-with-dependencies.jar -u -f 5 -b S_lycopersicum_Sep_2019.2bit -o myShortFileoutput.bed samtoolsOutput.bamsamtools view -b Malintka-R1-0hr-25C-self.bam Possible solution: Change code to find_junctions to allow soft-clipping if over the distance required. But this is for only one file so would it be worth changing the code? Could also review the multiqc report to maybe discover clipping issue: Per Base Sequence Content has failed and usually means there are contaminates in the library. Aligners should be able to deal with bases that don't align and soft-clip them. But in this case for this one sample it did not do this correctly?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Ann comment: Identify the aligned region and visualize it in IGB. Document and capture the image for presentations.

            From meeting: Re-run / double-check on Malintka-R1-8hr-25C-self (not the same sample that messed up the find junctions program.)

            Show
            ann.loraine Ann Loraine added a comment - - edited Ann comment: Identify the aligned region and visualize it in IGB. Document and capture the image for presentations. From meeting: Re-run / double-check on Malintka-R1-8hr-25C-self (not the same sample that messed up the find junctions program.)
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            New ticket was made for junctions java error: IGBF-3434

            Next step for this ticket: Review Malintka-R1-8hr-25C-self.bam file size issue and see if the problem is repetitive after nextflow rerun

            Show
            Mdavis4290 Molly Davis added a comment - - edited New ticket was made for junctions java error : IGBF-3434 Next step for this ticket : Review Malintka-R1-8hr-25C-self.bam file size issue and see if the problem is repetitive after nextflow rerun
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Malintka-R1-8hr-25C-self.bam

            • The files does seem to be very small (618061348M) compared to other bam files on the cluster. I checked the multiQC report and this sample does have the worst alignment score from unmapped reads due to them being too short. This means that this isn't a pipeline issue but more of a sample issue.

            Next Step: Please review my conclusions and move ticket to done if there are no issues.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Malintka-R1-8hr-25C-self.bam The files does seem to be very small (618061348M) compared to other bam files on the cluster. I checked the multiQC report and this sample does have the worst alignment score from unmapped reads due to them being too short. This means that this isn't a pipeline issue but more of a sample issue. Next Step: Please review my conclusions and move ticket to done if there are no issues.
            Hide
            robofjoy Robert Reid added a comment -

            I took a peak at the number of reads that align for each expt.

            Total reads that align:
            Total Read Count
            Heinz-Ovary-R1-0hr-25C-unpol.bam 50979464
            Heinz-Ovary-R2-0hr-25C-unpol.bam 39862409
            Heinz-Ovary-R3-0hr-25C-unpol.bam 52051210
            Heinz-R1-0hr-25C-self.bam 40438938
            Heinz-R1-3hr-25C-self.bam 49881450
            Heinz-R1-3hr-37C-self.bam 51660872
            Heinz-R1-8hr-25C-self.bam 22196798
            Heinz-R1-8hr-37C-self.bam 50926978
            Heinz-R2-0hr-25C-self.bam 49858292
            Heinz-R2-3hr-25C-self.bam 50700025
            Heinz-R2-3hr-37C-self.bam 54923225
            Heinz-R2-8hr-25C-self.bam 35292739
            Heinz-R2-8hr-37C-self.bam 51480404
            Heinz-R3-0hr-25C-self.bam 51577031
            Heinz-R3-3hr-25C-self.bam 45984816
            Heinz-R3-3hr-37C-self.bam 50612275
            Heinz-R3-8hr-25C-self.bam 48707431
            Heinz-R3-8hr-37C-self.bam 51695156
            Malintka-R1-0hr-25C-self.bam 50644328
            Malintka-R1-3hr-25C-self.bam 28575876
            Malintka-R1-3hr-37C-self.bam 50707327
            Malintka-R1-8hr-25C-self.bam 11997289
            Malintka-R1-8hr-37C-self.bam 49495351
            Malintka-R2-0hr-25C-self.bam 52432853
            Malintka-R2-3hr-25C-self.bam 45473568
            Malintka-R2-3hr-37C-self.bam 44168728
            Malintka-R2-8hr-25C-self.bam 46900467
            Malintka-R2-8hr-37C-self.bam 51370963
            Malintka-R3-0hr-25C-self.bam 47086317
            Malintka-R3-3hr-25C-self.bam 51085380
            Malintka-R3-3hr-37C-self.bam 29231411
            Malintka-R3-8hr-25C-self.bam 44275394
            Malintka-R3-8hr-37C-self.bam 55644007
            Nagcarlang-R1-0hr-25C-self.bam 40968404
            Nagcarlang-R1-3hr-25C-self.bam 46600257
            Nagcarlang-R1-3hr-37C-self.bam 32781471
            Nagcarlang-R1-8hr-25C-self.bam 46430696
            Nagcarlang-R1-8hr-37C-self.bam 49122203
            Nagcarlang-R2-0hr-25C-self.bam 41919616
            Nagcarlang-R2-3hr-25C-self.bam 25538537
            Nagcarlang-R2-3hr-37C-self.bam 48518069
            Nagcarlang-R2-8hr-25C-self.bam 44389303
            Nagcarlang-R2-8hr-37C-self.bam 55567552
            Nagcarlang-R3-0hr-25C-self.bam 35216477
            Nagcarlang-R3-3hr-25C-self.bam 53420765
            Nagcarlang-R3-3hr-37C-self.bam 55258962
            Nagcarlang-R3-8hr-25C-self.bam 50940477
            Nagcarlang-R3-8hr-37C-self.bam 51768199
            Tamaulipas-R1-0hr-25C-self.bam 19287669
            Tamaulipas-R1-3hr-25C-self.bam 54518901
            Tamaulipas-R1-3hr-37C-self.bam 47761377
            Tamaulipas-R1-8hr-25C-self.bam 48604790
            Tamaulipas-R1-8hr-37C-self.bam 50484499
            Tamaulipas-R2-0hr-25C-self.bam 46447680
            Tamaulipas-R2-3hr-25C-self.bam 48552903
            Tamaulipas-R2-3hr-37C-self.bam 46727628
            Tamaulipas-R2-8hr-25C-self.bam 56810522
            Tamaulipas-R2-8hr-37C-self.bam 48005344
            Tamaulipas-R3-0hr-25C-self.bam 40520115
            Tamaulipas-R3-3hr-25C-self.bam 42195143
            Tamaulipas-R3-3hr-37C-self.bam 49785458
            Tamaulipas-R3-8hr-25C-self.bam 45882212
            Tamaulipas-R3-8hr-37C-self.bam 44242182

            Script that does this is here:
            /projects/tomato_genome/scripts/rob/summarizeBAMAlignments.slurm

            Show
            robofjoy Robert Reid added a comment - I took a peak at the number of reads that align for each expt. Total reads that align: Total Read Count Heinz-Ovary-R1-0hr-25C-unpol.bam 50979464 Heinz-Ovary-R2-0hr-25C-unpol.bam 39862409 Heinz-Ovary-R3-0hr-25C-unpol.bam 52051210 Heinz-R1-0hr-25C-self.bam 40438938 Heinz-R1-3hr-25C-self.bam 49881450 Heinz-R1-3hr-37C-self.bam 51660872 Heinz-R1-8hr-25C-self.bam 22196798 Heinz-R1-8hr-37C-self.bam 50926978 Heinz-R2-0hr-25C-self.bam 49858292 Heinz-R2-3hr-25C-self.bam 50700025 Heinz-R2-3hr-37C-self.bam 54923225 Heinz-R2-8hr-25C-self.bam 35292739 Heinz-R2-8hr-37C-self.bam 51480404 Heinz-R3-0hr-25C-self.bam 51577031 Heinz-R3-3hr-25C-self.bam 45984816 Heinz-R3-3hr-37C-self.bam 50612275 Heinz-R3-8hr-25C-self.bam 48707431 Heinz-R3-8hr-37C-self.bam 51695156 Malintka-R1-0hr-25C-self.bam 50644328 Malintka-R1-3hr-25C-self.bam 28575876 Malintka-R1-3hr-37C-self.bam 50707327 Malintka-R1-8hr-25C-self.bam 11997289 Malintka-R1-8hr-37C-self.bam 49495351 Malintka-R2-0hr-25C-self.bam 52432853 Malintka-R2-3hr-25C-self.bam 45473568 Malintka-R2-3hr-37C-self.bam 44168728 Malintka-R2-8hr-25C-self.bam 46900467 Malintka-R2-8hr-37C-self.bam 51370963 Malintka-R3-0hr-25C-self.bam 47086317 Malintka-R3-3hr-25C-self.bam 51085380 Malintka-R3-3hr-37C-self.bam 29231411 Malintka-R3-8hr-25C-self.bam 44275394 Malintka-R3-8hr-37C-self.bam 55644007 Nagcarlang-R1-0hr-25C-self.bam 40968404 Nagcarlang-R1-3hr-25C-self.bam 46600257 Nagcarlang-R1-3hr-37C-self.bam 32781471 Nagcarlang-R1-8hr-25C-self.bam 46430696 Nagcarlang-R1-8hr-37C-self.bam 49122203 Nagcarlang-R2-0hr-25C-self.bam 41919616 Nagcarlang-R2-3hr-25C-self.bam 25538537 Nagcarlang-R2-3hr-37C-self.bam 48518069 Nagcarlang-R2-8hr-25C-self.bam 44389303 Nagcarlang-R2-8hr-37C-self.bam 55567552 Nagcarlang-R3-0hr-25C-self.bam 35216477 Nagcarlang-R3-3hr-25C-self.bam 53420765 Nagcarlang-R3-3hr-37C-self.bam 55258962 Nagcarlang-R3-8hr-25C-self.bam 50940477 Nagcarlang-R3-8hr-37C-self.bam 51768199 Tamaulipas-R1-0hr-25C-self.bam 19287669 Tamaulipas-R1-3hr-25C-self.bam 54518901 Tamaulipas-R1-3hr-37C-self.bam 47761377 Tamaulipas-R1-8hr-25C-self.bam 48604790 Tamaulipas-R1-8hr-37C-self.bam 50484499 Tamaulipas-R2-0hr-25C-self.bam 46447680 Tamaulipas-R2-3hr-25C-self.bam 48552903 Tamaulipas-R2-3hr-37C-self.bam 46727628 Tamaulipas-R2-8hr-25C-self.bam 56810522 Tamaulipas-R2-8hr-37C-self.bam 48005344 Tamaulipas-R3-0hr-25C-self.bam 40520115 Tamaulipas-R3-3hr-25C-self.bam 42195143 Tamaulipas-R3-3hr-37C-self.bam 49785458 Tamaulipas-R3-8hr-25C-self.bam 45882212 Tamaulipas-R3-8hr-37C-self.bam 44242182 Script that does this is here: /projects/tomato_genome/scripts/rob/summarizeBAMAlignments.slurm
            Hide
            robofjoy Robert Reid added a comment -

            Compared to the average number of reads aligned, the 3 expts with much lower total reads are:
            name num aligned Percent of average
            Tamaulipas-R1-0hr-25C-self.bam 19287669 -57.8
            Malintka-R1-8hr-25C-self.bam 11997289 -73.7
            Heinz-R1-8hr-25C-self.bam 22196798 -51.4

            Going to explore next if this pattern is observed throughout NEXTFLOW and post processing.

            Show
            robofjoy Robert Reid added a comment - Compared to the average number of reads aligned, the 3 expts with much lower total reads are: name num aligned Percent of average Tamaulipas-R1-0hr-25C-self.bam 19287669 -57.8 Malintka-R1-8hr-25C-self.bam 11997289 -73.7 Heinz-R1-8hr-25C-self.bam 22196798 -51.4 Going to explore next if this pattern is observed throughout NEXTFLOW and post processing.
            Hide
            robofjoy Robert Reid added a comment -

            Bam checking shows the BAMs seem intact. The temporary steps in NEXTFLOW are lost as part of the post cleanup so we can't peak at what happens mid process.

            I left a copy of the slurm script in:

            /projects/tomato_genome/scripts/rob/summarizeBAMAlignments.slurm

            So Bam files seem fine at this stage.

            Show
            robofjoy Robert Reid added a comment - Bam checking shows the BAMs seem intact. The temporary steps in NEXTFLOW are lost as part of the post cleanup so we can't peak at what happens mid process. I left a copy of the slurm script in: /projects/tomato_genome/scripts/rob/summarizeBAMAlignments.slurm So Bam files seem fine at this stage.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Had to rebase my branch:

            git checkout IGBF-3420
            git pull --rebase upstream main
            git push --force origin IGBF-3420
            

            Branch: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420
            Pull Request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/9

            Note: Files should include SL4 Multiqc report and counts.tsv file

            Show
            Mdavis4290 Molly Davis added a comment - - edited Had to rebase my branch: git checkout IGBF-3420 git pull --rebase upstream main git push --force origin IGBF-3420 Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420 Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/9 Note: Files should include SL4 Multiqc report and counts.tsv file
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Just moved all coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537

            Permission note:
            '- rw-rw-r- -' (first rw references owner can read and write, next rw references group can read and write, next r references everyone)

            Show
            Mdavis4290 Molly Davis added a comment - - edited Just moved all coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 Permission note: '- rw-rw-r- -' (first rw references owner can read and write, next rw references group can read and write, next r references everyone)
            Hide
            ann.loraine Ann Loraine added a comment -

            Suggestions for testing:

            • View the multi_qc report, compare to SL5 counterpart
            • Check that the gene counts file has the expected number of lines (same as equivalent files from other SL4 data processing)
            Show
            ann.loraine Ann Loraine added a comment - Suggestions for testing: View the multi_qc report, compare to SL5 counterpart Check that the gene counts file has the expected number of lines (same as equivalent files from other SL4 data processing)
            Hide
            Mdavis4290 Molly Davis added a comment -

            Testing:

            • Comparing SL4 and SL5 multiqc report: Both reports are pretty similar. of course mapping differs between each genome but overall there are no major differences.
            • The counts.tsv file has the correct amount of rows for genes. This was compared with muday counts.tsv SL4 output data.
            • One note is that the SL5 counts.tsv file is not present in the repo.
            Show
            Mdavis4290 Molly Davis added a comment - Testing : Comparing SL4 and SL5 multiqc report: Both reports are pretty similar. of course mapping differs between each genome but overall there are no major differences. The counts.tsv file has the correct amount of rows for genes. This was compared with muday counts.tsv SL4 output data. One note is that the SL5 counts.tsv file is not present in the repo.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Branch: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b

            Adds file: ExternalData/30-804059537-SL5-salmon.merged.gene_counts.tsv

            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b Adds file : ExternalData/30-804059537-SL5-salmon.merged.gene_counts.tsv
            Hide
            robofjoy Robert Reid added a comment - - edited

            Beginning with review of all things in branch above.
            Took a quick glance at the .tsv salmon file in the recent branch and it appears intact and correct, i.e., # of rows expected and # of columns.

            Robust counts! Lots of things going on expression wise. This might be a very interesting dataset!

            Moving on to the cluster ( /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537) to check those bits.

            Show
            robofjoy Robert Reid added a comment - - edited Beginning with review of all things in branch above. Took a quick glance at the .tsv salmon file in the recent branch and it appears intact and correct, i.e., # of rows expected and # of columns. Robust counts! Lots of things going on expression wise. This might be a very interesting dataset! Moving on to the cluster ( /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537) to check those bits.
            Hide
            robofjoy Robert Reid added a comment -

            In /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537

            As expected:
            63 bams
            63 bais
            126 .gz files (63 bedgraphs, 63 beds)
            126 .tbi files to go with the bed files.

            The BAM files have a wide range of sizes which at this point we are thinking is just how those sequences are.
            Taking a quick peak at the raw files, /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq$
            The size of these fastq files are not as wide ranging as the BAM files are.
            Makes me think that either the BAM alignment had issues on some samples (less likely, but we will find out later when we rerun these as SRA's) or something in the sample prep went awry and there is contamination. To test this we could run STAR, and get all the reads that do not align. I think NEXTFLOW is deleting the unaligned reads.

            If we get our hands on unaligned reads we can blast against the NR DB and see if there is other things that got sequenced, like for example the technician!
            I'll review more soon.

            Show
            robofjoy Robert Reid added a comment - In /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 As expected: 63 bams 63 bais 126 .gz files (63 bedgraphs, 63 beds) 126 .tbi files to go with the bed files. The BAM files have a wide range of sizes which at this point we are thinking is just how those sequences are. Taking a quick peak at the raw files, /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq$ The size of these fastq files are not as wide ranging as the BAM files are. Makes me think that either the BAM alignment had issues on some samples (less likely, but we will find out later when we rerun these as SRA's) or something in the sample prep went awry and there is contamination. To test this we could run STAR, and get all the reads that do not align. I think NEXTFLOW is deleting the unaligned reads. If we get our hands on unaligned reads we can blast against the NR DB and see if there is other things that got sequenced, like for example the technician! I'll review more soon.
            Hide
            robofjoy Robert Reid added a comment -

            For this:
            https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b

            Is all that is needed is to check out the .tsv file?
            I'll assume so and bounce this ticket back to Molly!

            Show
            robofjoy Robert Reid added a comment - For this: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b Is all that is needed is to check out the .tsv file? I'll assume so and bounce this ticket back to Molly!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pull Request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/10

            Note: Adds 30-804059537-SL5-salmon.merged.gene_counts.tsv to external folder with SL4 tsv file so comparisons can be made.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/10 Note : Adds 30-804059537-SL5-salmon.merged.gene_counts.tsv to external folder with SL4 tsv file so comparisons can be made.
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged the PR but then changed name from 30-804059537-SL5-salmon.merged.gene_counts.tsv to 30-804059537-SL5_salmon.merged.gene_counts.tsv to match convention established for other such files throughout the project.

            Show
            ann.loraine Ann Loraine added a comment - Merged the PR but then changed name from 30-804059537-SL5-salmon.merged.gene_counts.tsv to 30-804059537-SL5_salmon.merged.gene_counts.tsv to match convention established for other such files throughout the project.

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: