[IGBF-3721] Document Tardigrade RNA-Seq Pipeline - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Support tardigrade genome assemblies and rna-seq in IGB
Sprint:
Spring 9, Summer 2, Summer 3, Summer 4, Summer 6

Description

The genomic pipeline for downloading and processing tomato data has been well documented, but it includes a version of Nextflow which has recently been superseded. Now that we're working on setting up tardigrade quickloads, we should ideally be adapting this pipeline to use the newest version of Nextflow. This pipeline will likely be shared with the Goldstein lab as part of our outreach efforts.

Task: Document the genomic pipeline for downloading and processing tardigrade RNA-Seq data with the newest version of Nextflow. Ensure that it is easy to understand (i.e., keep the Goldstein lab in mind as a target audience).

Attachments

Issue Links

relates to

IGBF-3735 Run the latest version of the nf-core/rnaseq pipeline with tardigrade data

Closed

IGBF-3790 Run nf-core/rnaseq v 3.14 on SRP484252 (2024 Goldstein Lab)

Closed

IGBF-3708 Download tardigrade RNA-Seq data onto cluster

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Paige Kulzer added a comment - 07/May/24 11:17 AM

Link to the Google Doc version of the pipeline: https://docs.google.com/document/d/1o5iAcs4Bk6hNrprGu31-JPdNrSWn1hSetQbGrz1v_SM/edit?usp=sharing

Show

Paige Kulzer added a comment - 07/May/24 11:17 AM Link to the Google Doc version of the pipeline: https://docs.google.com/document/d/1o5iAcs4Bk6hNrprGu31-JPdNrSWn1hSetQbGrz1v_SM/edit?usp=sharing

Hide

Permalink

Paige Kulzer added a comment - 13/May/24 3:57 PM

Moving this to "Needs Review" alongside the linked ticket (IGBF-3708) so that the pipeline documentation can be reviewed at the same time. Please note that I've kept track of encountered errors and their fixes on the linked ticket, not in the pipeline documentation.

Show

Paige Kulzer added a comment - 13/May/24 3:57 PM Moving this to "Needs Review" alongside the linked ticket ( IGBF-3708 ) so that the pipeline documentation can be reviewed at the same time. Please note that I've kept track of encountered errors and their fixes on the linked ticket, not in the pipeline documentation.

Hide

Permalink

Ann Loraine added a comment - 16/May/24 9:27 AM - edited

Thanks for the draft!

Change requests:

Please move Table 1 to the end of the document.
Make a new section "Introduction" and add some filler text "explain what this document is for" in the Introduction. We'll add more later!
Please remove version-controlled code from the document. Instead, link to the file in bitbucket, e.g., https://bitbucket.org/lorainelab/tardigrade/src/main/src/prefetch.sh
Include some text explaining what the user will see in their file system after the "prefetch.sh" command has run. What should they observe after running it?
Instead of running "prefetch" in "fastq" run it inside another directory called "sra" (you make it). This will ensure that the "sra" files get downloaded into a single location where no other important files will reside. Then, after the ".sra" files get downloaded and used to create the "fastq" files, we can delete the "sra" directory and all its big files in a single command.
Run the "fasterq-dump" step inside the "sra" directory after runing the "prefetch" step.
After that, we use "gzip.sh" to compress all the fastq files.
After the "fasterq-dump" and "gzip" steps finish, we'll move the compressed files to a new directory called "fastq" using unix "mv" command.

Also, a note:

The "fasterq-dump" script is going to break when it encounters non-paired-end data. That is, if there is no read 2 file, it will fail. We need to develop a new version that is more able to handle non-paired-end data.

Show

Ann Loraine added a comment - 16/May/24 9:27 AM - edited Thanks for the draft! Change requests: Please move Table 1 to the end of the document. Make a new section "Introduction" and add some filler text "explain what this document is for" in the Introduction. We'll add more later! Please remove version-controlled code from the document. Instead, link to the file in bitbucket, e.g., https://bitbucket.org/lorainelab/tardigrade/src/main/src/prefetch.sh Include some text explaining what the user will see in their file system after the "prefetch.sh" command has run. What should they observe after running it? Instead of running "prefetch" in "fastq" run it inside another directory called "sra" (you make it). This will ensure that the "sra" files get downloaded into a single location where no other important files will reside. Then, after the ".sra" files get downloaded and used to create the "fastq" files, we can delete the "sra" directory and all its big files in a single command. Run the "fasterq-dump" step inside the "sra" directory after runing the "prefetch" step. After that, we use "gzip.sh" to compress all the fastq files. After the "fasterq-dump" and "gzip" steps finish, we'll move the compressed files to a new directory called "fastq" using unix "mv" command. Also, a note: The "fasterq-dump" script is going to break when it encounters non-paired-end data. That is, if there is no read 2 file, it will fail. We need to develop a new version that is more able to handle non-paired-end data.

Hide

Permalink

Ann Loraine added a comment - 04/Aug/24 1:11 PM - edited

See linked ticket ~~IGBF-3790~~ for the most recent trial run of the tardigrade RNA-Seq pipeline in Loraine Lab git repository called "tardigrade."

Show

Ann Loraine added a comment - 04/Aug/24 1:11 PM - edited See linked ticket IGBF-3790 for the most recent trial run of the tardigrade RNA-Seq pipeline in Loraine Lab git repository called "tardigrade."

Document Tardigrade RNA-Seq Pipeline

Details

Description

Attachments

Issue Links

Activity

People

Dates