Uploaded image for project: 'Testing'
  1. Testing
  2. TEST-24

IGB visualization of files (.bam and .bed) output by TopHat 2.0 (using Galaxy)-smaller Arabadopsis test file (single-end)

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 8.1.1
    • Labels:
      None
    • Environment:

      test.galaxyproject.org & IGB 8.2 (using a Mac OS X 10.9.3)

      Description

      Date: June 4,2014

      Purpose: To test aspects of Galaxy/IGB integration, specifically, to test IGB visualization of files (.bam and .bed) output by Tophat 2.0 (using Galaxy).

      Files Tested: RNA-Seq Data from Arabadopsis thaliana-smaller test file (Single-end)
      1) Ler3-smaller.fastq

      Workflow:

      1) I uploaded .fastq file to Galaxy, [Get Data==>Upload file]. My file was uploaded to Galaxy by browsing to the file location on my computer.

      2) I performed a quality control check on data using the FASTQC tool in Galaxy [NGS: QC & Manipulation==>FASTQC:Read QC]. This allowed me to determine if my data set had any issues that need to be noted before analyzing. I clicked on "View data" (the eyeball icon) to view the FASTQC report.
      *Issue I encountered: FASTQC tool fails to complete task (error: Picked up _JAVA_O). It was not working correctly on the Galaxy test site on 6/3/14 , but it was working correctly on 6/4/14. It is working correctly on the Galaxy main site with .fastq files as well as .fastqsanger files.

      4) I determined that my data set was ready for further analysis based on the report.

      5) Before running Tophat2 to map processed reads to the genome it was necessary to ensure that my .fastq file has Sanger-scaled quality values with ASCII offset 33. I made sure this is the case by running FASTQ Groomer [NGS: QC & Manipulation==>FASTQ Groomer] on my .fastq file. I selected the FASTQ Groomer tool, then selected my .fastq file to groom and then selected "Sanger & Illumina 1.8+" as the Input FASTQ quality scores type and hit "Execute".

      6) I had a new file in my history that had been groomed and I was ready to run TopHat 2.0 [NGS: RNA Analysis==>Tophat2]. I selected the tool, then selected "Single-end" for my data set. I selected my groomed RNA-Seq FASTQ files, then selected my reference genome (Arabadopsis thaliana TAIR10). I selected to use Tophat's default settings. I did not specify read group. When finished, I hit "Execute" to begin the task.
      *Issue I encountered earlier but already corrected in previous step: .fastq file is not selectable within the Tophat 2 widget. This most likely means that the .fastq file does not contain the compatible FASTQ quality scores type that Tophat uses (must have Sanger-scaled quality values with ASCII offset 33).
      -To Fix: I ran the FASTQ Groomer tool [NGS: QC & Manipulation==>FASTQ Groomer] as outlined in step #4. I selected "Sanger & Illumina 1.8+" as the Input FASTQ quality scores type and hit "Execute". After doing that my .fastq file was compatible with Tophat and I was able to select my groomed file within the Tophat widget.

      7) Tophat 2.0 will output 5 files into your Galaxy history: "Accepted Hits" (.bam), "Splice Junctions" (.bed), "Deletions" (.bed), and "Insertions" (.bed). I clicked on the title of each file to open a drop-down menu below containing additional information and options. To visualize the file within the Integrated Genome Browser (IGB) I found where it says "display in IGB View" in the drop-down menu and clicked on the "View" link.

      8) An IGB webpage opened up in a new window of my browser stating that my data was ready to view. IGB is properly installed onto my computer and the program was running, so I clicked on the button that says "Click to go to IGB".

      9) This brought me to the Integrated Genome Browser interface. There was an empty track of my file among the tracks displayed. I navigated to my chromosome/scaffold of interest on the right-hand side of the IGB window. I zoomed to my region of interest by adjusting the zoom slider located across the top of the IGB window. Then I clicked on my track and hit "Load Data" located in the upper, right-hand corner of the IGB window.
      *Issue I encountered: Data loads but the IGB track seems to be white, but empty.
      -To Fix: I used the window on the right-hand side of IGB to locate my chromosome of interest. If was unsure of my chromosome of interest so I went back to my Galaxy history and clicked on "View data" (the eyeball icon) beside the "Splice Junction" (.bed) file output by Tophat to show a data table of Tophat's results for "Splice Junctions". The first column indicated the chromosome that contained splice junctions and so this is where I navigated to within IGB to visualize data.
      *Issue I encountered: Can't visualize the "Insertions" (.bed) file output by Tophat in IGB. I learned that this is because the location of an insertion is actually in the space between bases, so it would technically be incorrect for IGB to overlay the insertions in the same position that bases occupy.

      10) I went back to my Galaxy history and repeated steps #6-8 to add the rest of the files output by Tophat to IGB. This made it possible for me to visualize Tophat output files simultaneously. An image was exported as well so that I could have a copy of the IGB visualization.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mason Mason Meyer (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: