Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3809

RNA- Quast deeper dive and explore

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Goal:

      To dive deeper into RNA-Quast and determine what our results are.

      The tool works on the cluster and Brandon has got it to run on the de novo assemblies.

      So while we got results, we need to be able to interpret them.
      And figure out how to harness this tool.

      So that means reading the original paper and exploring how other papers (rnaSpades paper) use this as a metric for assessing their assemblies.

        Attachments

          Activity

          Hide
          robofjoy Robert Reid added a comment -

          The paper about the tool:
          https://academic.oup.com/bioinformatics/article/32/14/2210/1743439

          rnaQUAST: a quality assessment tool for de novo transcriptome assemblies
          Elena Bushmanova, Dmitry Antipov, Alla Lapidus, Vladimir Suvorov, Andrey D. Prjibelski Author Notes
          Bioinformatics, Volume 32, Issue 14, July 2016, Pages 2210–2212, https://doi.org/10.1093/bioinformatics/btw218
          Published: 23 April 2016

          Show
          robofjoy Robert Reid added a comment - The paper about the tool: https://academic.oup.com/bioinformatics/article/32/14/2210/1743439 rnaQUAST: a quality assessment tool for de novo transcriptome assemblies Elena Bushmanova, Dmitry Antipov, Alla Lapidus, Vladimir Suvorov, Andrey D. Prjibelski Author Notes Bioinformatics, Volume 32, Issue 14, July 2016, Pages 2210–2212, https://doi.org/10.1093/bioinformatics/btw218 Published: 23 April 2016
          Hide
          robofjoy Robert Reid added a comment -

          "By comparing the resulting alignments with the gene database, rnaQUAST calculates various statistics and generates a summary report."

          Show
          robofjoy Robert Reid added a comment - "By comparing the resulting alignments with the gene database, rnaQUAST calculates various statistics and generates a summary report."
          Hide
          robofjoy Robert Reid added a comment -

          Scripts from initial rna-quast run are located here:

          /projects/tomato_genome/fnb/dataprocessing/brandon_work/tam/tam-rnaquast
          /projects/tomato_genome/fnb/dataprocessing/brandon_work/nag/nag-rnaquast
          /projects/tomato_genome/fnb/dataprocessing/brandon_work/mal/mal-rnaquast
          /projects/tomato_genome/fnb/dataprocessing/brandon_work/hei/hei-rnaquast

          Within each of these folders are

          Show
          robofjoy Robert Reid added a comment - Scripts from initial rna-quast run are located here: /projects/tomato_genome/fnb/dataprocessing/brandon_work/tam/tam-rnaquast /projects/tomato_genome/fnb/dataprocessing/brandon_work/nag/nag-rnaquast /projects/tomato_genome/fnb/dataprocessing/brandon_work/mal/mal-rnaquast /projects/tomato_genome/fnb/dataprocessing/brandon_work/hei/hei-rnaquast Within each of these folders are
          Hide
          robofjoy Robert Reid added a comment -

          RnaQuast is now a module on the cluster also:

          module load rnaquast
          Loading rnaquast/2.3.0

          Show
          robofjoy Robert Reid added a comment - RnaQuast is now a module on the cluster also: module load rnaquast Loading rnaquast/2.3.0
          Hide
          robofjoy Robert Reid added a comment - - edited

          The actual python rnaquast script is buried deep in the env.

          module load rnaquast
          /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py

          Sample command:
          (https://github.com/ablab/rnaquast?tab=readme-ov-file#sec2)

          python /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py \
          --transcripts /PATH/TO/transcripts1.fasta /PATH/TO/ANOTHER/transcripts2.fasta /PATH/TO/MULTIPLE/*.fasta [...] \
          --reference /PATH/TO/reference_genome.fasta --gtf /PATH/TO/gene_coordinates.gtf

          All of the options:
          python /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py
          usage: /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py [-h] [r REFERENCE [REFERENCE ...]] [-gtf GTF [GTF ...]] [--gene_db GENE_DB] [-c TRANSCRIPTS [TRANSCRIPTS ...]] [-psl ALIGNMENT [ALIGNMENT ...]] [-sam READS_ALIGNMENT] [-1 LEFT_READS]
          [-2 RIGHT_READS] [-s SINGLE_READS] [--gmap_index GMAP_INDEX] [-o OUTPUT_DIR] [--test] [-d] [-t THREADS] [-l LABELS [LABELS ...]] [-ss] [--min_alignment MIN_ALIGNMENT] [--no_plots]
          [--blat] [--gene_mark] [--meta] [--lower_threshold LOWER_THRESHOLD] [--upper_threshold UPPER_THRESHOLD] [--disable_infer_genes] [--disable_infer_transcripts] [--busco BUSCO]
          [--prokaryote]

          Show
          robofjoy Robert Reid added a comment - - edited The actual python rnaquast script is buried deep in the env. module load rnaquast /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py Sample command: ( https://github.com/ablab/rnaquast?tab=readme-ov-file#sec2 ) python /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py \ --transcripts /PATH/TO/transcripts1.fasta /PATH/TO/ANOTHER/transcripts2.fasta /PATH/TO/MULTIPLE/*.fasta [...] \ --reference /PATH/TO/reference_genome.fasta --gtf /PATH/TO/gene_coordinates.gtf All of the options: python /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py usage: /apps/pkg/anaconda3/envs/rnaquast-2.3.0/bin/rnaQUAST.py [-h] [ r REFERENCE [REFERENCE ...] ] [ -gtf GTF [GTF ...] ] [--gene_db GENE_DB] [-c TRANSCRIPTS [TRANSCRIPTS ...] ] [-psl ALIGNMENT [ALIGNMENT ...] ] [-sam READS_ALIGNMENT] [-1 LEFT_READS] [-2 RIGHT_READS] [-s SINGLE_READS] [--gmap_index GMAP_INDEX] [-o OUTPUT_DIR] [--test] [-d] [-t THREADS] [-l LABELS [LABELS ...] ] [-ss] [--min_alignment MIN_ALIGNMENT] [--no_plots] [--blat] [--gene_mark] [--meta] [--lower_threshold LOWER_THRESHOLD] [--upper_threshold UPPER_THRESHOLD] [--disable_infer_genes] [--disable_infer_transcripts] [--busco BUSCO] [--prokaryote]
          Hide
          robofjoy Robert Reid added a comment -

          Tried out a test run.

          Slurm script for this run can be found here:
          /projects/tomato_genome/scripts/rob/rnaquastRun.slurm

          Reported run log says it was a success!
          Results are here:
          /projects/tomato_genome/fnb/dataprocessing/trinity/rnaquast/rnaQUAST_results/

          We get a lot for a short run!

          rw-rr- 1 rreid2 tomato_genome 0 Aug 8 19:30 blat_tamaulipas_bestLongHit.unaligned.fasta
          rw-rr- 1 rreid2 tomato_genome 13M Aug 8 19:30 blat_tamaulipas_bestLongHit.paralogs.fasta
          rw-rr- 1 rreid2 tomato_genome 2.8M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.fasta
          rw-rr- 1 rreid2 tomato_genome 3.5M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.blat.fasta
          rw-rr- 1 rreid2 tomato_genome 46M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.blast.fasta
          rw-rr- 1 rreid2 tomato_genome 51M Aug 8 19:30 blat_tamaulipas_bestLongHit.correct.fasta
          rw-rr- 1 rreid2 tomato_genome 839K Aug 8 19:30 blat_tamaulipas_bestLongHit.unannotated.fasta
          rw-rr- 1 rreid2 tomato_genome 217K Aug 8 19:30 blat_tamaulipas_bestLongHit.95%-assembled.list
          rw-rr- 1 rreid2 tomato_genome 318K Aug 8 19:30 blat_tamaulipas_bestLongHit.50%-assembled.list

          Show
          robofjoy Robert Reid added a comment - Tried out a test run. Slurm script for this run can be found here: /projects/tomato_genome/scripts/rob/rnaquastRun.slurm Reported run log says it was a success! Results are here: /projects/tomato_genome/fnb/dataprocessing/trinity/rnaquast/rnaQUAST_results/ We get a lot for a short run! rw-r r - 1 rreid2 tomato_genome 0 Aug 8 19:30 blat_tamaulipas_bestLongHit.unaligned.fasta rw-r r - 1 rreid2 tomato_genome 13M Aug 8 19:30 blat_tamaulipas_bestLongHit.paralogs.fasta rw-r r - 1 rreid2 tomato_genome 2.8M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.fasta rw-r r - 1 rreid2 tomato_genome 3.5M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.blat.fasta rw-r r - 1 rreid2 tomato_genome 46M Aug 8 19:30 blat_tamaulipas_bestLongHit.misassembled.blast.fasta rw-r r - 1 rreid2 tomato_genome 51M Aug 8 19:30 blat_tamaulipas_bestLongHit.correct.fasta rw-r r - 1 rreid2 tomato_genome 839K Aug 8 19:30 blat_tamaulipas_bestLongHit.unannotated.fasta rw-r r - 1 rreid2 tomato_genome 217K Aug 8 19:30 blat_tamaulipas_bestLongHit.95%-assembled.list rw-r r - 1 rreid2 tomato_genome 318K Aug 8 19:30 blat_tamaulipas_bestLongHit.50%-assembled.list
          Hide
          robofjoy Robert Reid added a comment -

          SHORT SUMMARY REPORT

          METRICS/TRANSCRIPTS blat_tamaulipas_bestLongHit blat_malintka_bestLongHit blat_nagcarlang_bestLongHit blat_heinz_bestLongHit SL5.cds

          == DATABASE METRICS ==
          Genes 36648 36648 36648 36648 36648

          Avg. number of exons per isoform 5.888 5.888 5.888 5.888 5.888

          == BASIC TRANSCRIPTS METRICS ==
          Transcripts 27867 27208 27100 27380 43752

          Transcripts > 500 bp 25328 24941 25013 25142 29854

          Transcripts > 1000 bp 21769 21590 21789 21935 18306

          == ALIGNMENT METRICS ==
          Aligned 27867 27201 27087 27374 43752

          Uniquely aligned 25077 24169 23922 24258 43646

          Multiply aligned 4819 5131 5177 51
          39 105
          Unaligned 0 7 13 6 0

          == ALIGNMENT METRICS FOR NON-MISASSEMBLED TRANSCRIPTS ==
          Avg. aligned fraction 0.968 0.982 0.982 0.982 0.999

          Avg. alignment length 2379.194 2398.298 2643.803 2550.996 1119.497

          Avg. mismatches per transcript 4.863 4.179 4.506 3.893 0.062

          == ALIGNMENT METRICS FOR MISASSEMBLED (CHIMERIC) TRANSCRIPTS ==
          Misassemblies 816 603 693 668 0

          == ASSEMBLY COMPLETENESS (SENSITIVITY) ==
          Database coverage 0.496 0.514 0.509 0.512 0.75

          Show
          robofjoy Robert Reid added a comment - SHORT SUMMARY REPORT METRICS/TRANSCRIPTS blat_tamaulipas_bestLongHit blat_malintka_bestLongHit blat_nagcarlang_bestLongHit blat_heinz_bestLongHit SL5.cds == DATABASE METRICS == Genes 36648 36648 36648 36648 36648 Avg. number of exons per isoform 5.888 5.888 5.888 5.888 5.888 == BASIC TRANSCRIPTS METRICS == Transcripts 27867 27208 27100 27380 43752 Transcripts > 500 bp 25328 24941 25013 25142 29854 Transcripts > 1000 bp 21769 21590 21789 21935 18306 == ALIGNMENT METRICS == Aligned 27867 27201 27087 27374 43752 Uniquely aligned 25077 24169 23922 24258 43646 Multiply aligned 4819 5131 5177 51 39 105 Unaligned 0 7 13 6 0 == ALIGNMENT METRICS FOR NON-MISASSEMBLED TRANSCRIPTS == Avg. aligned fraction 0.968 0.982 0.982 0.982 0.999 Avg. alignment length 2379.194 2398.298 2643.803 2550.996 1119.497 Avg. mismatches per transcript 4.863 4.179 4.506 3.893 0.062 == ALIGNMENT METRICS FOR MISASSEMBLED (CHIMERIC) TRANSCRIPTS == Misassemblies 816 603 693 668 0 == ASSEMBLY COMPLETENESS (SENSITIVITY) == Database coverage 0.496 0.514 0.509 0.512 0.75
          Hide
          robofjoy Robert Reid added a comment -

          Well this is great and wonderful and all that.
          A new ticket shall be created to take these results where we export off of the cluster and come up with a cleaner way to present the findings!

          R

          Show
          robofjoy Robert Reid added a comment - Well this is great and wonderful and all that. A new ticket shall be created to take these results where we export off of the cluster and come up with a cleaner way to present the findings! R
          Hide
          robofjoy Robert Reid added a comment -

          Let's have Brandon do a review of the script and folders and then close this ticket.

          Show
          robofjoy Robert Reid added a comment - Let's have Brandon do a review of the script and folders and then close this ticket.
          Hide
          robofjoy Robert Reid added a comment -

          Brandon and I sat and walked through the script and quickly reviewed the results.

          The next step is to run this on the RNASpades results so that we can compare those.
          Closing this ticket and making a new one!

          Show
          robofjoy Robert Reid added a comment - Brandon and I sat and walked through the script and quickly reviewed the results. The next step is to run this on the RNASpades results so that we can compare those. Closing this ticket and making a new one!

            People

            • Assignee:
              bbendick Brandon Bendickson
              Reporter:
              robofjoy Robert Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: