[IGBF-4103] Read mango paper on gene discovery - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Minor
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
1
Epic Link:
Support NSF pollen grant
Sprint:
Spring 2

Description

Read this paper with the idea of how we can take our Reference free efforts and do something along the same lines.

De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango

https://www.nature.com/articles/s41597-019-0350-9

Attachments

Activity

Ascending order - Click to sort in descending order

Robert Reid created issue - 18/Feb/25 10:33 AM

Robert Reid made changes - 18/Feb/25 10:33 AM

Field	Original Value	New Value
Epic Link		IGBF-2993 [ 21429 ]

Brandon Bendickson made changes - 19/Feb/25 10:32 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Hide

Permalink

Brandon Bendickson added a comment - 19/Feb/25 12:03 PM

This comment will serve as my notes for the article:

Methods
-Raw RNA-seq reads were pre-processed using trimmomatic with default parameters
-RNA-Seq read quality was assessed using FastQC and aggregated using MultiQC
-Trinity was used for de novo transcriptome assembly and validation was done using BUSCO
-They used HISAT2 to map reads to respective references
-Used TransDecoder with default setting to predict coding regions, they selected the best open reading frame per transcript longer than 100 peptides
-Used CD-HIT-EST with default params to reduce redundancy and produce unique genes
-Used BLAST to assign function annotations to the unique genes

-BLASTx program was used to annotate genes based on UniProtKB, which is a manually annotated, non-redundant protein sequence db.

How can we use this on Tomatoes
-Could go back and try validating our assemblies with BUSCO
-Try running TransDecoder on our contigs to predict coding regions, perhaps after finding the best long hits from BLAT
-Use CD-HIT to find our unique genes
-Use BLASTx to annotate our genes

Show

Brandon Bendickson added a comment - 19/Feb/25 12:03 PM This comment will serve as my notes for the article: Methods -Raw RNA-seq reads were pre-processed using trimmomatic with default parameters -RNA-Seq read quality was assessed using FastQC and aggregated using MultiQC -Trinity was used for de novo transcriptome assembly and validation was done using BUSCO -They used HISAT2 to map reads to respective references -Used TransDecoder with default setting to predict coding regions, they selected the best open reading frame per transcript longer than 100 peptides -Used CD-HIT-EST with default params to reduce redundancy and produce unique genes -Used BLAST to assign function annotations to the unique genes -BLASTx program was used to annotate genes based on UniProtKB, which is a manually annotated, non-redundant protein sequence db. How can we use this on Tomatoes -Could go back and try validating our assemblies with BUSCO -Try running TransDecoder on our contigs to predict coding regions, perhaps after finding the best long hits from BLAT -Use CD-HIT to find our unique genes -Use BLASTx to annotate our genes

Hide

Permalink

Robert Reid added a comment - 20/Feb/25 10:40 AM

I think we can dismiss BUSCO in this instance.
Since we are only looking at a transcriptome in some pistil tissue and maybe some pollen.
For BUSCO and completeness, we'd want leaf, and root and flower and all of the tissue sequenced to assess completeness.

Transdecoder, let's do it!!

CD-HIT, Heck yeah!!
These will be new tickets.

Show

Robert Reid added a comment - 20/Feb/25 10:40 AM I think we can dismiss BUSCO in this instance. Since we are only looking at a transcriptome in some pistil tissue and maybe some pollen. For BUSCO and completeness, we'd want leaf, and root and flower and all of the tissue sequenced to assess completeness. Transdecoder, let's do it!! CD-HIT, Heck yeah!! These will be new tickets.