Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3375

Run Nextflow with 2021 Palanivelu Lab-generated samples

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Directory: /projects/tomato_genome/rnaseq/ravi-tamaulipas

      • Copy fastq files to /nobackup/tomato_genome/... location
      • Make csv
      • Run nf-core pipeline (SL4 & SL5 runs)

      Note: Check to make sure configuration is the same intron size as other tomato experiments

        Attachments

          Issue Links

            Activity

            Hide
            Mdavis4290 Molly Davis added a comment -

            I will also make sure to wait for the pull request in the future! Thanks!

            Show
            Mdavis4290 Molly Davis added a comment - I will also make sure to wait for the pull request in the future! Thanks!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            I agree! Moving forward and will inspect warning samples 'Tamaulipas-pistils-3hr-25C-R2' & 'Tamaulipas-pistils-0hr-25C-R2' in IGB once deployed. Thanks! [~aloraine]

            Show
            Mdavis4290 Molly Davis added a comment - - edited I agree! Moving forward and will inspect warning samples 'Tamaulipas-pistils-3hr-25C-R2' & 'Tamaulipas-pistils-0hr-25C-R2' in IGB once deployed. Thanks! [~aloraine]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Looking at "General Statistics" tables in the SL4 and SL5 report.
            Because the Web page is difficult for me to read, I imported the tables into an Excel spreadsheet, named: Ravi_2021_SL4_v_SL5_GeneralStatistics_multiqc_report.xlsx and added to the repository.

            I note right off the bat that "M Reads Mapping" (millions of reads mapped) is larger for SL5 in 7 out of 10 samples. To keep track of which samples had more or less "M Reads Mapping" values, I color-coded cells: pale blue if the value was larger and pale orange if the value was smaller. Thus, the SL5 table contained many more pale blue cells than the SL4 table.

            Next, I did sanity-checking on the pipelines. The "% BP trimmed", "% Dups", "%GC", "Length" and "M Seqs" columns should be the same in the SL4 and SL5 pipeline results, as these steps merely process the reads themselves, independently from the genome alignment target. Thus, for these metrics, using a different reference genome assembly (SL4 versus SL5) should not matter. I scanned the values manually, using my eyes. No differences spotted. Sanity check passes.

            As [~molly] noted during scrum meetings, two samples evoked a "WARNING: Fail Strand Check" alert in the SL4 mapping but elicited no such warning in the SL5 mapping. WTF? I suppose this could have something to do with the difference in the annotations used, possibly. We cannot really tell at this level what is going on, so I recommended we proceed with deploying the data for genome browser visualization and then manually inspect these samples for a possible error. We might find something cool. Who knows?

            Show
            ann.loraine Ann Loraine added a comment - - edited Looking at "General Statistics" tables in the SL4 and SL5 report. Because the Web page is difficult for me to read, I imported the tables into an Excel spreadsheet, named: Ravi_2021_SL4_v_SL5_GeneralStatistics_multiqc_report.xlsx and added to the repository. I note right off the bat that "M Reads Mapping" (millions of reads mapped) is larger for SL5 in 7 out of 10 samples. To keep track of which samples had more or less "M Reads Mapping" values, I color-coded cells: pale blue if the value was larger and pale orange if the value was smaller. Thus, the SL5 table contained many more pale blue cells than the SL4 table. Next, I did sanity-checking on the pipelines. The "% BP trimmed", "% Dups", "%GC", "Length" and "M Seqs" columns should be the same in the SL4 and SL5 pipeline results, as these steps merely process the reads themselves, independently from the genome alignment target. Thus, for these metrics, using a different reference genome assembly (SL4 versus SL5) should not matter. I scanned the values manually, using my eyes. No differences spotted. Sanity check passes. As [~molly] noted during scrum meetings, two samples evoked a "WARNING: Fail Strand Check" alert in the SL4 mapping but elicited no such warning in the SL5 mapping. WTF? I suppose this could have something to do with the difference in the annotations used, possibly. We cannot really tell at this level what is going on, so I recommended we proceed with deploying the data for genome browser visualization and then manually inspect these samples for a possible error. We might find something cool. Who knows?
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            PR merged.
            Commencing review.
            In future, please hold off on the pull requests until review is completed. Please respect the process. Skipping a bunch of steps in the process makes it a lot harder for me to manage and follow the work, as there are a lot of people doing work simultaneously on different projects. Thank you for understanding.

            Show
            ann.loraine Ann Loraine added a comment - - edited PR merged. Commencing review. In future, please hold off on the pull requests until review is completed. Please respect the process. Skipping a bunch of steps in the process makes it a lot harder for me to manage and follow the work, as there are a lot of people doing work simultaneously on different projects. Thank you for understanding.
            Hide
            Mdavis4290 Molly Davis added a comment -

            Branch: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3375
            Pull Request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/6

            Reviewer: Please look over the multiqc reports. There was an issue for the 2021 SL4 strandedness but no issue for SL5. If no further investigation is needed please merge. Thanks!

            Show
            Mdavis4290 Molly Davis added a comment - Branch : https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3375 Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/6 Reviewer : Please look over the multiqc reports. There was an issue for the 2021 SL4 strandedness but no issue for SL5. If no further investigation is needed please merge. Thanks!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Update: I reran SL4 with 'unstranded' as all of the strandedness samples and the mutliqc report still had warnings. Due to no changes to the SL4 multiqc report and SL5 having no problems I am going to stick with using 'reverse' as the strandedness and use the original results for SL4.

            Next Step: Commit CSV and multiQC reports to repo. Then move to review. Please investigate the reports and discover strandedness issue.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Update : I reran SL4 with 'unstranded' as all of the strandedness samples and the mutliqc report still had warnings. Due to no changes to the SL4 multiqc report and SL5 having no problems I am going to stick with using 'reverse' as the strandedness and use the original results for SL4. Next Step : Commit CSV and multiQC reports to repo. Then move to review. Please investigate the reports and discover strandedness issue.
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            HPC Directory: /nobackup/tomato_genome/Ravi_tamaulipas_2021
            SL5 Directory: /nobackup/tomato_genome/Ravi_tamaulipas_2021/Ravi_2021_SL5
            SL4 Directory: /nobackup/tomato_genome/Ravi_tamaulipas_2021/Ravi_2021_SL4

            Next steps:

            MultiQC report notes: Compared SL4 and SL5 multiQC reports. SL5 had no warnings and strandedness was reverse. SL4 multiQC report did have a failed strandedness warning for 'Tamaulipas-pistils-3hr-25C-R2' & 'Tamaulipas-pistils-0hr-25C-R2'. It is saying that the two samples are unstranded and not reverse compared to the rest of the samples. For mapping, SL5 & SL4 are pretty similar and have no major differences but overall SL5 has higher alignment scores.

            nf-core warnings:

            -[nf-core/rnaseq] Pipeline completed successfully-
            WARN: Process 'NFCORE_RNASEQ:RNASEQ:MULTIQC_TSV_FAIL_MAPPED' cannot be executed by 'slurm' executor -- Using 'local' executor instead
            WARN: Process 'NFCORE_RNASEQ:RNASEQ:MULTIQC_TSV_STRAND_CHECK' cannot be executed by 'slurm' executor -- Using 'local' executor instead
            

            Notes: I reran nextflow and the same warnings came up for SL4 but when looking at the multiqc report the mapping doesn't seem wrong and the SL5 mapping had no issues with those specific samples. I believe it is an issue with the pipeline itself or something needs to be changed for the SL4 files. For example, I am using directory locations for the SL4 files instead of copied versions:

            ./doIt.sh Ravi_tamaulipas_2021-samples.csv /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.fa /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.gtf /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.bed tomato.config 1> out.1.txt 2> err.1.txt
            

            Could also be an issue with slurm not being able to process correctly with SL4 but does fine with SL5.

            Next step: rerun with CSV just have 'unstranded' for strandedness for SL4.

            Show
            Mdavis4290 Molly Davis added a comment - - edited HPC Directory : /nobackup/tomato_genome/Ravi_tamaulipas_2021 SL5 Directory : /nobackup/tomato_genome/Ravi_tamaulipas_2021/Ravi_2021_SL5 SL4 Directory : /nobackup/tomato_genome/Ravi_tamaulipas_2021/Ravi_2021_SL4 Next steps: Check multiQC report Add multiQC report to repo if no warnings Add sample.csv to repo: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/src/main/ExternalData/ Rename other sample.csv in that directory to 2023 date MultiQC report notes : Compared SL4 and SL5 multiQC reports. SL5 had no warnings and strandedness was reverse. SL4 multiQC report did have a failed strandedness warning for 'Tamaulipas-pistils-3hr-25C-R2' & 'Tamaulipas-pistils-0hr-25C-R2'. It is saying that the two samples are unstranded and not reverse compared to the rest of the samples. For mapping, SL5 & SL4 are pretty similar and have no major differences but overall SL5 has higher alignment scores. nf-core warnings: -[nf-core/rnaseq] Pipeline completed successfully- WARN: Process 'NFCORE_RNASEQ:RNASEQ:MULTIQC_TSV_FAIL_MAPPED' cannot be executed by 'slurm' executor -- Using 'local' executor instead WARN: Process 'NFCORE_RNASEQ:RNASEQ:MULTIQC_TSV_STRAND_CHECK' cannot be executed by 'slurm' executor -- Using 'local' executor instead Notes : I reran nextflow and the same warnings came up for SL4 but when looking at the multiqc report the mapping doesn't seem wrong and the SL5 mapping had no issues with those specific samples. I believe it is an issue with the pipeline itself or something needs to be changed for the SL4 files. For example, I am using directory locations for the SL4 files instead of copied versions: ./doIt.sh Ravi_tamaulipas_2021-samples.csv /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.fa /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.gtf /nobackup/tomato_genome/nfcore_rnaseq/S_lycopersicum_Sep_2019.bed tomato.config 1> out.1.txt 2> err.1.txt Could also be an issue with slurm not being able to process correctly with SL4 but does fine with SL5. Next step : rerun with CSV just have 'unstranded' for strandedness for SL4.

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: