Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3721

Document Tardigrade RNA-Seq Pipeline

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      The genomic pipeline for downloading and processing tomato data has been well documented, but it includes a version of Nextflow which has recently been superseded. Now that we're working on setting up tardigrade quickloads, we should ideally be adapting this pipeline to use the newest version of Nextflow. This pipeline will likely be shared with the Goldstein lab as part of our outreach efforts.

      Task: Document the genomic pipeline for downloading and processing tardigrade RNA-Seq data with the newest version of Nextflow. Ensure that it is easy to understand (i.e., keep the Goldstein lab in mind as a target audience).

        Attachments

          Issue Links

            Activity

            Hide
            pkulzer Paige Kulzer added a comment -
            Show
            pkulzer Paige Kulzer added a comment - Link to the Google Doc version of the pipeline: https://docs.google.com/document/d/1o5iAcs4Bk6hNrprGu31-JPdNrSWn1hSetQbGrz1v_SM/edit?usp=sharing
            Hide
            pkulzer Paige Kulzer added a comment -

            Moving this to "Needs Review" alongside the linked ticket (IGBF-3708) so that the pipeline documentation can be reviewed at the same time. Please note that I've kept track of encountered errors and their fixes on the linked ticket, not in the pipeline documentation.

            Show
            pkulzer Paige Kulzer added a comment - Moving this to "Needs Review" alongside the linked ticket ( IGBF-3708 ) so that the pipeline documentation can be reviewed at the same time. Please note that I've kept track of encountered errors and their fixes on the linked ticket, not in the pipeline documentation.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Thanks for the draft!

            Change requests:

            • Please move Table 1 to the end of the document.
            • Make a new section "Introduction" and add some filler text "explain what this document is for" in the Introduction. We'll add more later!
            • Please remove version-controlled code from the document. Instead, link to the file in bitbucket, e.g., https://bitbucket.org/lorainelab/tardigrade/src/main/src/prefetch.sh
            • Include some text explaining what the user will see in their file system after the "prefetch.sh" command has run. What should they observe after running it?
            • Instead of running "prefetch" in "fastq" run it inside another directory called "sra" (you make it). This will ensure that the "sra" files get downloaded into a single location where no other important files will reside. Then, after the ".sra" files get downloaded and used to create the "fastq" files, we can delete the "sra" directory and all its big files in a single command.
            • Run the "fasterq-dump" step inside the "sra" directory after runing the "prefetch" step.
            • After that, we use "gzip.sh" to compress all the fastq files.
            • After the "fasterq-dump" and "gzip" steps finish, we'll move the compressed files to a new directory called "fastq" using unix "mv" command.

            Also, a note:

            The "fasterq-dump" script is going to break when it encounters non-paired-end data. That is, if there is no read 2 file, it will fail. We need to develop a new version that is more able to handle non-paired-end data.

            Show
            ann.loraine Ann Loraine added a comment - - edited Thanks for the draft! Change requests: Please move Table 1 to the end of the document. Make a new section "Introduction" and add some filler text "explain what this document is for" in the Introduction. We'll add more later! Please remove version-controlled code from the document. Instead, link to the file in bitbucket, e.g., https://bitbucket.org/lorainelab/tardigrade/src/main/src/prefetch.sh Include some text explaining what the user will see in their file system after the "prefetch.sh" command has run. What should they observe after running it? Instead of running "prefetch" in "fastq" run it inside another directory called "sra" (you make it). This will ensure that the "sra" files get downloaded into a single location where no other important files will reside. Then, after the ".sra" files get downloaded and used to create the "fastq" files, we can delete the "sra" directory and all its big files in a single command. Run the "fasterq-dump" step inside the "sra" directory after runing the "prefetch" step. After that, we use "gzip.sh" to compress all the fastq files. After the "fasterq-dump" and "gzip" steps finish, we'll move the compressed files to a new directory called "fastq" using unix "mv" command. Also, a note: The "fasterq-dump" script is going to break when it encounters non-paired-end data. That is, if there is no read 2 file, it will fail. We need to develop a new version that is more able to handle non-paired-end data.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            See linked ticket IGBF-3790 for the most recent trial run of the tardigrade RNA-Seq pipeline in Loraine Lab git repository called "tardigrade."

            Show
            ann.loraine Ann Loraine added a comment - - edited See linked ticket IGBF-3790 for the most recent trial run of the tardigrade RNA-Seq pipeline in Loraine Lab git repository called "tardigrade."

              People

              • Assignee:
                pkulzer Paige Kulzer
                Reporter:
                pkulzer Paige Kulzer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: