Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      GOAL: To find highly similar protein cluster via CDHIT.

      Need to see if this is a module already on the cluster.
      If not, need to figure out installation.
      Need to create a script to run.

      We run CDHIT on each Transdecoder run we produce.

        Attachments

          Activity

          Hide
          robofjoy Robert Reid added a comment -

          The results look good.
          We need to look at these a bit more and break down the 26,000+ clusters. We can summarize these results as a figure.

          But we might want to run these again as a new ticket, and explore -c option of 0.9. 90% sequence similarity. Can try both more stringent and less.
          But that will be a new ticket.
          This one is ready for closing.

          Show
          robofjoy Robert Reid added a comment - The results look good. We need to look at these a bit more and break down the 26,000+ clusters. We can summarize these results as a figure. But we might want to run these again as a new ticket, and explore -c option of 0.9. 90% sequence similarity. Can try both more stringent and less. But that will be a new ticket. This one is ready for closing.
          Hide
          bbendick Brandon Bendickson added a comment -

          Reran CD-HIT, but this time appended RALF proteins to end of our denovo protein fastas.

          I ran transdecoder on our filtered and unfiltered contigs, so I also ran CDHIT on each. Summarizing cluster numbers in ticket 4113.

          filtered results: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/filtered_contigs/varieties_with_RALF/results

          unfiltered results: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/unfiltered_contigs/varieties_with_RALF/results

          Show
          bbendick Brandon Bendickson added a comment - Reran CD-HIT, but this time appended RALF proteins to end of our denovo protein fastas. I ran transdecoder on our filtered and unfiltered contigs, so I also ran CDHIT on each. Summarizing cluster numbers in ticket 4113. filtered results: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/filtered_contigs/varieties_with_RALF/results unfiltered results: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/unfiltered_contigs/varieties_with_RALF/results
          Hide
          bbendick Brandon Bendickson added a comment -

          Successfully ran CD-HIT on our de novo contigs (filtered/unfiltered protein files from Transdecoder) with the 10 RALF proteins and the Dehydration-induced 19-like protein.

          filtered results are found here: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/filtered_contigs/results/filtered_varieties_merged.clstr

          unfiltered results are found here:/projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/unfiltered_contigs/results/unfiltered_varities_merged.clstr

          We get at least one hit for all RALF proteins except SLRALF2.

          Show
          bbendick Brandon Bendickson added a comment - Successfully ran CD-HIT on our de novo contigs (filtered/unfiltered protein files from Transdecoder) with the 10 RALF proteins and the Dehydration-induced 19-like protein. filtered results are found here: /projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/filtered_contigs/results/filtered_varieties_merged.clstr unfiltered results are found here:/projects/tomato_genome/fnb/dataprocessing/brandon_work/CD-HIT/RALF_with_denovo/unfiltered_contigs/results/unfiltered_varities_merged.clstr We get at least one hit for all RALF proteins except SLRALF2.
          Hide
          bbendick Brandon Bendickson added a comment -

          Also include RALF proteins and the gene (in protein form) "Solyc01g100140.4.1:Dehydration-induced 19-like protein"

          Show
          bbendick Brandon Bendickson added a comment - Also include RALF proteins and the gene (in protein form) "Solyc01g100140.4.1:Dehydration-induced 19-like protein"

            People

            • Assignee:
              robofjoy Robert Reid
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: