Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3500

Re-run mark-2022-timeseries data with data downloaded from SRA

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Re-run mark 2022 timeseries data with the name SRP441343 from SRA for both SL4 and SL5 genomes.

      For this task, we need to confirm and sanity-check the mark 2022 time series data that Rob uploaded and submitted to the Sequence Read Archive.

      If the data are good, we will replace all the existing BAM, junctions, etc. files deployed in the "hotpollen" quickload site with newly processed data.
      For this task:

      • Check SRP on NCBI and review submission
      • Download the data onto the cluster by using the SRP name
      • Run nf-core/rnaseq pipeline
      • Run our coverage graph and junctions scripts on the data

      Note that all files should now use their "SRR" names instead of the existing file names.

        Attachments

          Issue Links

            Activity

            Hide
            robofjoy Robert Reid added a comment -

            SL5 folder

            The NFcore file structure is as expected.

            54 files for each bam, bai, bed.gz. and bed.gz.tbi

            Bam files are expected.

            All .err files are 0.

            All looks good.

            Next up:
            csv files and html in bitbucket.

            Show
            robofjoy Robert Reid added a comment - SL5 folder The NFcore file structure is as expected. 54 files for each bam, bai, bed.gz. and bed.gz.tbi Bam files are expected. All .err files are 0. All looks good. Next up: csv files and html in bitbucket.
            Hide
            robofjoy Robert Reid added a comment -

            In https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3500

            the HTML files both exist with content.

            The .csv file looks to have the full 55 lines (54 expts plus a header) with the proper fastq in each column.

            I call this done!

            Show
            robofjoy Robert Reid added a comment - In https://bitbucket.org/mdavis4290/molly-splicing-analysis/branch/IGBF-3500 the HTML files both exist with content. The .csv file looks to have the full 55 lines (54 expts plus a header) with the proper fastq in each column. I call this done!
            Show
            Mdavis4290 Molly Davis added a comment - Thank you! Robert Reid PR : https://bitbucket.org/hotpollen/splicing-analysis/pull-requests/13
            Hide
            ann.loraine Ann Loraine added a comment -

            Testing suggestions:

            • Open the newly added .html files in a Web browser to check that they didn't get corrupted somehow (by mistake, of course
            • Check that the .html files mention the expected SRA identifiers
            • Check that the SRA identifiers listed in the added csv files match up with the .html files
            • Check that the csv file SRA identifiers are repeated in the expected way in the expected columns (e.g., sample names match up with file names)
            • Make a note of any interesting (or not so interesting!) differences in results obtained for SL4 and SL5, recalling whether or not SL5 has more or less gene models and genes than SL4
            Show
            ann.loraine Ann Loraine added a comment - Testing suggestions: Open the newly added .html files in a Web browser to check that they didn't get corrupted somehow (by mistake, of course Check that the .html files mention the expected SRA identifiers Check that the SRA identifiers listed in the added csv files match up with the .html files Check that the csv file SRA identifiers are repeated in the expected way in the expected columns (e.g., sample names match up with file names) Make a note of any interesting (or not so interesting!) differences in results obtained for SL4 and SL5, recalling whether or not SL5 has more or less gene models and genes than SL4
            Hide
            Mdavis4290 Molly Davis added a comment -

            Testing:

            • html files open and report accurate information
            • SRA identifiers are present
            • csv SRA identifiers match the SRA identifiers in the html files
            • the fastq file SRA identifiers match the sample SRA identifiers in the csv file
            • There seems to be more 'reads mapped' for SL5 than SL4. But for SL4 there are more '% Proper Pairs' than SL5.

            Next step: prepare data to be moved from the cluster to IGB quick load. Refer to IGBF-3499

            Moving ticket to done!

            Show
            Mdavis4290 Molly Davis added a comment - Testing : html files open and report accurate information SRA identifiers are present csv SRA identifiers match the SRA identifiers in the html files the fastq file SRA identifiers match the sample SRA identifiers in the csv file There seems to be more 'reads mapped' for SL5 than SL4. But for SL4 there are more '% Proper Pairs' than SL5. Next step: prepare data to be moved from the cluster to IGB quick load. Refer to IGBF-3499 Moving ticket to done!

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                Mdavis4290 Molly Davis
              • Votes:
                0 Vote for this issue
                Watchers:
                Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: