[IGBF-3258] Download and process SRP371294 RNA-Seq data - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
3
Epic Link:
Support NSF pollen grant
Sprint:
Spring 6 2023 Mar 20, Spring 7 2023 Apr 10, Spring 8 2023 Apr 24

Description

This data set from Arabidopsis thaliana contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

download the data as fastq files from SRA
align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
for alignment parameters, use original publication (referenced above) and their parameters that they used in their experiment. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
align using same maxIntron parameter reported in the methods section for the paper
for the above, make a new "config" file
create coverage graphs
create junction files

Use this reference genome for alignment:

http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

SRP371294_multiqc_report.html
1.50 MB
20/Apr/23 3:28 PM
Successful Pipeline Run 6.png
331 kB
20/Apr/23 3:29 PM

Issue Links

relates to

IGBF-3329 Create and test SRP371294 sample sheet spreadsheet for adding data to "hotpollen" annots.xml

To-Do

Activity

Ascending order - Click to sort in descending order

Ann Loraine created issue - 14/Feb/23 7:04 AM

Ann Loraine made changes - 14/Feb/23 7:04 AM

Field	Original Value	New Value
Epic Link		IGBF-2993 [ 21429 ]

Ann Loraine made changes - 14/Feb/23 7:04 AM

Story Points

Ann Loraine made changes - 14/Feb/23 7:09 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes.

For this task, download the data as fastq files and align them against the Arabidopsis TAIR10 genome. Create coverage graphs and junction files, as per usual.

Use this reference genome:

http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file and version-control it using name of
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

Ann Loraine made changes - 14/Feb/23 7:09 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file and version-control it using name of
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

Ann Loraine made changes - 14/Feb/23 7:10 AM

Sprint

Spring 3 2023 Feb 1 [ 163 ]

Spring 4 2023 Feb 13 [ 164 ]

Ann Loraine made changes - 14/Feb/23 7:10 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen.

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

Ann Loraine made changes - 22/Feb/23 10:11 AM

Rank

Ranked higher

Ann Loraine made changes - 23/Feb/23 10:29 AM

Sprint

Spring 4 2023 Feb 21 [ 164 ]

Spring 5 2023 Mar 6 [ 165 ]

Ann Loraine made changes - 07/Mar/23 8:02 AM

Sprint

Spring 5 2023 Mar 6 [ 165 ]

Spring 6 2023 Mar 20 [ 166 ]

Ann Loraine made changes - 22/Mar/23 10:06 AM

Rank

Ranked higher

Ann Loraine made changes - 22/Mar/23 10:06 AM

Assignee

Molly Davis [ molly ]

Ann Loraine made changes - 31/Mar/23 8:52 PM

Sprint

Spring 6 2023 Mar 20 [ 166 ]

Spring 6 2023 Mar 20, Spring 7 2023 Apr 10 [ 166, 167 ]

Ann Loraine made changes - 31/Mar/23 8:52 PM

Rank

Ranked higher

Molly Davis made changes - 11/Apr/23 3:03 PM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Ann Loraine made changes - 13/Apr/23 10:27 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

Ann Loraine made changes - 13/Apr/23 10:29 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

Ann Loraine made changes - 13/Apr/23 10:32 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check the paper:
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

Molly Davis made changes - 13/Apr/23 2:02 PM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
2bitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

Ann Loraine made changes - 14/Apr/23 10:18 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* Use reference genome annotations from the Araport 11 data set
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

Molly Davis made changes - 14/Apr/23 10:27 AM

Description

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, check one of our RNA-Seq data analysis papers that used Arabidopsis TAIR10 genome. Specifically, you need to get the "maximum intron" parameter. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* Use reference genome annotations from the Araport 11 data set
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

This [data set from Arabidopsis thaliana|https://trace.ncbi.nlm.nih.gov/Traces/?view=study&acc=SRP371294] contains six samples of FACS-sorted sperm and vegetative cells from mature pollen. The publication is here: https://pubmed.ncbi.nlm.nih.gov/36515615/

This would be a useful reference data set for our studies, as the authors reported many differentially and alternatively spliced genes between sperm and vegetative cells harvested from mature Arabidopsis pollen.

For this task

* download the data as fastq files from SRA
* align fastq files using nf-core/rnaseq vs. TAIR10 genome (see link below)
* for alignment parameters, use original publication (referenced above) and their parameters that they used in their experiment. Every RNA-Seq to genome alignment tool requires the user to define a maximum intron size parameter. Never use the default! Customize for your species!
* align using same maxIntron parameter reported in the methods section for the paper
* for the above, make a new "config" file
* create coverage graphs
* create junction files

Use this reference genome for alignment:

* http://lorainelab-quickload.scidas.org/quickload/A_thaliana_Jun_2009/A_thaliana_Jun_2009.2bit

Create the "fasta" file from the above 2bit file using blat suite tools on cluster. The program you need is 2bitToFa (I think to load it, you have to use "module load blatsuite" or something like that. Use "module avail" to find the correct module name.)

2bitToFa command:

{code}
twoBitToFa A_thaliana_Jun_2009.2bit A_thaliana_Jun_2009.fa
{code}

Ann Loraine made changes - 17/Apr/23 10:04 AM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Ann Loraine made changes - 17/Apr/23 10:57 AM

Sprint

Spring 6 2023 Mar 20, Spring 7 2023 Apr 10 [ 166, 167 ]

Spring 6 2023 Mar 20, Spring 7 2023 Apr 10, Spring 8 2023 Apr 24 [ 166, 167, 168 ]

Ann Loraine made changes - 17/Apr/23 10:57 AM

Rank

Ranked higher

Molly Davis made changes - 18/Apr/23 9:49 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Molly Davis made changes - 18/Apr/23 10:34 AM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Molly Davis made changes - 18/Apr/23 10:34 AM

Assignee

Molly Davis [ molly ]

Ann Loraine [ aloraine ]

Ann Loraine made changes - 18/Apr/23 10:36 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Ann Loraine made changes - 18/Apr/23 11:21 AM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Ann Loraine made changes - 18/Apr/23 11:21 AM

Assignee

Ann Loraine [ aloraine ]

Molly Davis [ molly ]

Molly Davis made changes - 18/Apr/23 12:52 PM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Molly Davis made changes - 18/Apr/23 4:32 PM

Comment

[ Pipeline Error 1 solution:
Araport11_GTF_genes_transposons.current.gtf.gz is working with the fasta file. ]

Molly Davis made changes - 20/Apr/23 1:33 PM

Attachment

Screenshot 2023-04-20 at 1.32.08 PM.png [ 17864 ]

Molly Davis made changes - 20/Apr/23 3:21 PM

Attachment

Screenshot 2023-04-20 at 1.32.08 PM.png [ 17864 ]

Molly Davis made changes - 20/Apr/23 3:27 PM

Attachment

multiqc_report.html [ 17865 ]

Molly Davis made changes - 20/Apr/23 3:28 PM

Attachment

multiqc_report.html [ 17865 ]

Molly Davis made changes - 20/Apr/23 3:28 PM

Attachment

SRP371294_multiqc_report.html [ 17866 ]

Molly Davis made changes - 20/Apr/23 3:29 PM

Attachment

Successful Pipeline Run 6.png [ 17867 ]

Molly Davis made changes - 21/Apr/23 1:05 PM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Molly Davis made changes - 21/Apr/23 1:05 PM

Assignee

Molly Davis [ molly ]

Ann Loraine [ aloraine ]

Ann Loraine made changes - 25/Apr/23 1:35 PM

Status

Needs 1st Level Review [ 10005 ]

First Level Review in Progress [ 10301 ]