Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3420

Process 30-804059537 (KP 2023) data using S_lycopersicum_Sep_2019 (SL4) genome assembly and annotations

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      My apologies if this has already been done and I lost track of the data somehow!

      For this task, we need to align and deploy data from Azenta run 30-804059537 onto the SL4 / S_lycopersicum_Sep_2019 genome, using the S_lycopersicum_Sep_2019 assembly and accompanying gene annotations.

      To-do:

      • run the nextflow pipeline as per usual. Please use location /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/S_lycopersicum_Sep_2019 on the UNCC cluster
      • create a branch in the code repo as usual
      • commit to the branch the following files:

      30-804059537_SL4_multiqc_report.html (renamed from multiqc_report.html)
      30-804059537_SL4_salmon.merged.gene_counts.tsv (renamed from salmon.merged.gene_counts.tsv)

      • create coverage graphs and junction files, as per usual
      • move just the coverage graphs, bam files, and junction files to this location: /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 (We'll use this as the source directory for an rsync command that will copy these data over the data.bioviz.org host for visualization in IGB)

        Attachments

          Issue Links

            Activity

            Hide
            robofjoy Robert Reid added a comment - - edited

            Beginning with review of all things in branch above.
            Took a quick glance at the .tsv salmon file in the recent branch and it appears intact and correct, i.e., # of rows expected and # of columns.

            Robust counts! Lots of things going on expression wise. This might be a very interesting dataset!

            Moving on to the cluster ( /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537) to check those bits.

            Show
            robofjoy Robert Reid added a comment - - edited Beginning with review of all things in branch above. Took a quick glance at the .tsv salmon file in the recent branch and it appears intact and correct, i.e., # of rows expected and # of columns. Robust counts! Lots of things going on expression wise. This might be a very interesting dataset! Moving on to the cluster ( /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537) to check those bits.
            Hide
            robofjoy Robert Reid added a comment -

            In /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537

            As expected:
            63 bams
            63 bais
            126 .gz files (63 bedgraphs, 63 beds)
            126 .tbi files to go with the bed files.

            The BAM files have a wide range of sizes which at this point we are thinking is just how those sequences are.
            Taking a quick peak at the raw files, /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq$
            The size of these fastq files are not as wide ranging as the BAM files are.
            Makes me think that either the BAM alignment had issues on some samples (less likely, but we will find out later when we rerun these as SRA's) or something in the sample prep went awry and there is contamination. To test this we could run STAR, and get all the reads that do not align. I think NEXTFLOW is deleting the unaligned reads.

            If we get our hands on unaligned reads we can blast against the NR DB and see if there is other things that got sequenced, like for example the technician!
            I'll review more soon.

            Show
            robofjoy Robert Reid added a comment - In /projects/tomato_genome/fnb/dataprocessing/30-804059537-KP/for_quickload/S_lycopersicum_Sep_2019/30-804059537 As expected: 63 bams 63 bais 126 .gz files (63 bedgraphs, 63 beds) 126 .tbi files to go with the bed files. The BAM files have a wide range of sizes which at this point we are thinking is just how those sequences are. Taking a quick peak at the raw files, /projects/tomato_genome/rnaseq/30-804059537-kelsie/00_fastq$ The size of these fastq files are not as wide ranging as the BAM files are. Makes me think that either the BAM alignment had issues on some samples (less likely, but we will find out later when we rerun these as SRA's) or something in the sample prep went awry and there is contamination. To test this we could run STAR, and get all the reads that do not align. I think NEXTFLOW is deleting the unaligned reads. If we get our hands on unaligned reads we can blast against the NR DB and see if there is other things that got sequenced, like for example the technician! I'll review more soon.
            Hide
            robofjoy Robert Reid added a comment -

            For this:
            https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b

            Is all that is needed is to check out the .tsv file?
            I'll assume so and bounce this ticket back to Molly!

            Show
            robofjoy Robert Reid added a comment - For this: https://bitbucket.org/mdavis4290/molly-pistil-rna-seq/branch/IGBF-3420b Is all that is needed is to check out the .tsv file? I'll assume so and bounce this ticket back to Molly!
            Hide
            Mdavis4290 Molly Davis added a comment - - edited

            Pull Request: https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/10

            Note: Adds 30-804059537-SL5-salmon.merged.gene_counts.tsv to external folder with SL4 tsv file so comparisons can be made.

            Show
            Mdavis4290 Molly Davis added a comment - - edited Pull Request : https://bitbucket.org/hotpollen/pistil-rna-seq/pull-requests/10 Note : Adds 30-804059537-SL5-salmon.merged.gene_counts.tsv to external folder with SL4 tsv file so comparisons can be made.
            Hide
            ann.loraine Ann Loraine added a comment -

            Merged the PR but then changed name from 30-804059537-SL5-salmon.merged.gene_counts.tsv to 30-804059537-SL5_salmon.merged.gene_counts.tsv to match convention established for other such files throughout the project.

            Show
            ann.loraine Ann Loraine added a comment - Merged the PR but then changed name from 30-804059537-SL5-salmon.merged.gene_counts.tsv to 30-804059537-SL5_salmon.merged.gene_counts.tsv to match convention established for other such files throughout the project.

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: