Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3647

De novo Assembly Trinity Run n Kelsey data

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Trivial
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL: to set up a script to run Trinity on Kelsey data.

      Eventually get this script integrated into bitbucket for future purposes.
      This dataset could then be aligned as contigs to SL4 and SL5.
      This dataset could also be fed into IGB as a "reference" and we can look at how the raw sequences align the newly created contigs.

        Attachments

          Activity

          Hide
          robofjoy Robert Reid added a comment -

          Script to parse the blast results.
          ~/scripts/python/identifyBlatMatches.py ./infile.fa ./blat-tamaulipas.pslx

          Command for later to summarize the Trinity contigs.
          /projects/tomato_genome/fnb/dataprocessing/trinity/tam/blat$ awk -F "_i" 'BEGIN

          { OFS = "\t" }

          { print $1 }

          ' tmp2

          Show
          robofjoy Robert Reid added a comment - Script to parse the blast results. ~/scripts/python/identifyBlatMatches.py ./infile.fa ./blat-tamaulipas.pslx Command for later to summarize the Trinity contigs. /projects/tomato_genome/fnb/dataprocessing/trinity/tam/blat$ awk -F "_i" 'BEGIN { OFS = "\t" } { print $1 } ' tmp2
          Hide
          robofjoy Robert Reid added a comment -

          Blat runs were a success.
          Going to close this task and start a number of related new ones.

          1. Need to have these runs tested. New student Brandon starts this week, so this is a good task for him to learn the HPC cluster and how to learn slurm.
          2. Need a new task to start a google slide deck outlining this project. This will suitable for a biweekly tomato meeting. This will also allow me to ponder the next best steps.
          3. Need a new task to parse the results generated.
            #
          Show
          robofjoy Robert Reid added a comment - Blat runs were a success. Going to close this task and start a number of related new ones. Need to have these runs tested. New student Brandon starts this week, so this is a good task for him to learn the HPC cluster and how to learn slurm. Need a new task to start a google slide deck outlining this project. This will suitable for a biweekly tomato meeting. This will also allow me to ponder the next best steps. Need a new task to parse the results generated. #
          Hide
          robofjoy Robert Reid added a comment -

          First blat run was a success.

          As per Ann's suggestion to use pslx to get the sequence also, I will run again like so:

          blat /projects/tomato_genome/db/SL5/SL5.cds.fa ./infile.fa blat-$

          {file}.pslx \
          -ooc=${file}

          .11.ooc \
          -t=dna -q=dna \
          -maxIntron=10000 \
          -out=pslx

          This slrum script can be found at:
          /projects/tomato_genome/fnb/dataprocessing/trinity

          Once this works, we will repeat for the other 3 varieties. And then create new ticket for the next sprint to parse the results, create a refined set of contigs with good annotations to become the NEW reference genome and prepare NETFLOW.

          Show
          robofjoy Robert Reid added a comment - First blat run was a success. As per Ann's suggestion to use pslx to get the sequence also, I will run again like so: blat /projects/tomato_genome/db/SL5/SL5.cds.fa ./infile.fa blat-$ {file}.pslx \ -ooc=${file} .11.ooc \ -t=dna -q=dna \ -maxIntron=10000 \ -out=pslx This slrum script can be found at: /projects/tomato_genome/fnb/dataprocessing/trinity Once this works, we will repeat for the other 3 varieties. And then create new ticket for the next sprint to parse the results, create a refined set of contigs with good annotations to become the NEW reference genome and prepare NETFLOW.
          Hide
          robofjoy Robert Reid added a comment -

          Success on all 4 trinity runs.

          Locations of the assembled contigs:

          • /projects/tomato_genome/fnb/dataprocessing/trinity/mal/malintka-trinity.Trinity.fasta
          • /projects/tomato_genome/fnb/dataprocessing/trinity/hei/heinz-trinity.Trinity.fasta
          • /projects/tomato_genome/fnb/dataprocessing/trinity/nag/nagcarlang-trinity.Trinity.fasta
          • /projects/tomato_genome/fnb/dataprocessing/trinity/tam/tamaulipas-trinity.Trinity.fasta

          Now need to decide on next steps.
          Blat to get annotations.
          Star align expt reads to these contigs or a subset to get the read counts. (Nextflow / salmon)

          We expect 35,000 genes. We HAVE MANY more than that.

          File tamaulipas-trinity.Trinity.fasta

          Number of sequences 797,447

          Hopefully we have isoforms. In reality we will have many chimeras that are not biologically true.

          Show
          robofjoy Robert Reid added a comment - Success on all 4 trinity runs. Locations of the assembled contigs: /projects/tomato_genome/fnb/dataprocessing/trinity/mal/malintka-trinity.Trinity.fasta /projects/tomato_genome/fnb/dataprocessing/trinity/hei/heinz-trinity.Trinity.fasta /projects/tomato_genome/fnb/dataprocessing/trinity/nag/nagcarlang-trinity.Trinity.fasta /projects/tomato_genome/fnb/dataprocessing/trinity/tam/tamaulipas-trinity.Trinity.fasta Now need to decide on next steps. Blat to get annotations. Star align expt reads to these contigs or a subset to get the read counts. (Nextflow / salmon) We expect 35,000 genes. We HAVE MANY more than that. File tamaulipas-trinity.Trinity.fasta Number of sequences 797,447 Hopefully we have isoforms. In reality we will have many chimeras that are not biologically true.
          Hide
          robofjoy Robert Reid added a comment -

          Will start step 2, the blat script to align reads back to the Heinz de novo transcripts.

          Show
          robofjoy Robert Reid added a comment - Will start step 2, the blat script to align reads back to the Heinz de novo transcripts.

            People

            • Assignee:
              robofjoy Robert Reid
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: