Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4090

Investigate IGB Quickload to UCSC track hub converter

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: UCSC genome browser has track data hubs that are similar to IGB Quickloads. We have previously created a website to convert UCSC track hubs to IGB Quickloads (see IGBF-2831). We would now like to add the ability to convert an IGB Quickload to a UCSC track hub.

      Task: Investigate how our current website is converting UCSC track hubs to IGB Quickloads

      Track hub converter website: https://translate.bioviz.org/
      Bitbucket repo: https://bitbucket.org/lorainelab/hub-facade/src/main/

        Attachments

        1. genomes.txt
          0.0 kB
        2. hub.txt
          0.2 kB
        3. screenshot-1.png
          screenshot-1.png
          175 kB
        4. trackDb.txt
          2 kB

          Issue Links

            Activity

            Show
            nfreese Nowlan Freese added a comment - - edited Important file: https://bitbucket.org/lorainelab/hub-facade/src/main/igb_trackhub/api/create_resources.py UCSC Track hub URL: https://genome.ucsc.edu/cgi-bin/hgHubConnect This is the hub.txt API call: https://api.genome.ucsc.edu/list/hubGenomes?hubUrl=https://hgdownload.soe.ucsc.edu/hubs/GCA/000/001/905/GCA_000001905.1/hub.txt Documentation for setting up your own track hub: https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Setup File that is getting the API calls: https://bitbucket.org/lorainelab/hub-facade/src/main/igb_trackhub/api/views.py
            Hide
            uchinta Udaya Chinta (Inactive) added a comment -

            I followed the instructions in ucsc quick start guide https://genome.ucsc.edu/goldenPath/help/hubQuickStart.html and created the below files locally.

            To create hub.txt we need to create below three files in specific order:
            myHub/ - directory containing track hub files

            • hub.txt - a short description of hub properties
            • genomes.txt - list of genome assemblies included in the hub data
            • hg19/ - directory of data for the hg19 (GRCh37) human assembly
              • trackDb.txt - display properties for tracks in this directory

            There is a option called useOneFile on. Using this command in hub.txt we can have all files in single file but the limitation is that we cannot refer to different genomic assemblies in genomes.txt. It is used for only single genome.

            To host the files in server i used the below python command and ran the server in my local. But it did not work due to some restrictions.
            python -m http.server 8000

            So, I hosted the files in github and generated the url and pasted the url in connected hub. The hub.txt file got added succesfully.
            https://udaya67.github.io/ucsc_trackhubs/hub.txt

            Now I will start exploring the data in each file.

            Show
            uchinta Udaya Chinta (Inactive) added a comment - I followed the instructions in ucsc quick start guide https://genome.ucsc.edu/goldenPath/help/hubQuickStart.html and created the below files locally. hub.txt genomes.txt trackDb.txt To create hub.txt we need to create below three files in specific order: myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included in the hub data hg19/ - directory of data for the hg19 (GRCh37) human assembly trackDb.txt - display properties for tracks in this directory There is a option called useOneFile on. Using this command in hub.txt we can have all files in single file but the limitation is that we cannot refer to different genomic assemblies in genomes.txt. It is used for only single genome. To host the files in server i used the below python command and ran the server in my local. But it did not work due to some restrictions. python -m http.server 8000 So, I hosted the files in github and generated the url and pasted the url in connected hub. The hub.txt file got added succesfully. https://udaya67.github.io/ucsc_trackhubs/hub.txt Now I will start exploring the data in each file.
            Hide
            uchinta Udaya Chinta (Inactive) added a comment -
            Show
            uchinta Udaya Chinta (Inactive) added a comment - Github link : https://github.com/udaya67/ucsc_trackhubs
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Test QuickLoad repository: https://bitbucket.org/nfreese/testquickload/src/main/
            Test QuickLoad URL for adding to IGB as a QuickLoad: https://bitbucket.org/nfreese/testquickload/raw/main/quickload/
            Info about the annots.xml file: https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml

            Genome will appear as E_unicornis in the Species dropdown in IGB.

            Note: For some reason IGB is throwing an error when trying to load the sequence (2bit) stored in the Downloads folder in Bitbucket. This is not an issue for the gzipped bed file. I have copied the 2bit file to CyVerse and pointed at it in the annots.xml.

            Show
            nfreese Nowlan Freese added a comment - - edited Test QuickLoad repository: https://bitbucket.org/nfreese/testquickload/src/main/ Test QuickLoad URL for adding to IGB as a QuickLoad: https://bitbucket.org/nfreese/testquickload/raw/main/quickload/ Info about the annots.xml file: https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml Genome will appear as E_unicornis in the Species dropdown in IGB. Note: For some reason IGB is throwing an error when trying to load the sequence (2bit) stored in the Downloads folder in Bitbucket. This is not an issue for the gzipped bed file. I have copied the 2bit file to CyVerse and pointed at it in the annots.xml.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Example trackhub repo that uses the same data as the QuickLoad: https://bitbucket.org/nfreese/testquickload/src/main/trackHub/
            The raw URL to the trackhub was not working for UCSC, but the API url does seem to work: https://api.bitbucket.org/2.0/repositories/nfreese/testquickload/src/main/trackHub/hub.txt

            Note that UCSC does not accept gzipped bed files (see this link for additional details), so I had to change the bed annotation to a bigBed. See IGBF-2978 for details on how to do that.

            bedToBigBed -as=bedDetail.as -tab -type=bed12+2 E_unicornis_Jul_2043.bed chrom.sizes E_unicornis_Jul_2043.bb

            Note that because this genome is new (not a genome already found in UCSC), some additional fields were required. For example, in the genomes.txt I had to include the organism, defaultPos, and twoBitPath properties in that order.

            Show
            nfreese Nowlan Freese added a comment - - edited Example trackhub repo that uses the same data as the QuickLoad: https://bitbucket.org/nfreese/testquickload/src/main/trackHub/ The raw URL to the trackhub was not working for UCSC, but the API url does seem to work: https://api.bitbucket.org/2.0/repositories/nfreese/testquickload/src/main/trackHub/hub.txt Note that UCSC does not accept gzipped bed files (see this link for additional details), so I had to change the bed annotation to a bigBed. See IGBF-2978 for details on how to do that. bedToBigBed -as=bedDetail.as -tab -type=bed12+2 E_unicornis_Jul_2043.bed chrom.sizes E_unicornis_Jul_2043.bb Note that because this genome is new (not a genome already found in UCSC), some additional fields were required. For example, in the genomes.txt I had to include the organism, defaultPos, and twoBitPath properties in that order.
            Hide
            uchinta Udaya Chinta (Inactive) added a comment - - edited

            Created an example for POC from existing quickload:

            • hub.txt:
            1. hub IGBQuickLoad
            2. shortLabel IGB QUICK LOAD
            3. longLabel Generated from IGB QuickLoad
            4. genomesFile genomes.txt
            5. email myEmail@address
            • genomes.txt
            1. genome C_papaya_Aug_2010
              C_papaya_Aug_2010 is igb quickload name but we need to assign ucsc genome name. To get ucsc genome name we need to check in synomyns.txt. If it is not in synomyns.txt then we assign same igb quickload name.
            2. trackDb C_papaya_Aug_2010/trackDb.txt
            3. organism Carica papaya
              (we get the organism from species.txt by checking with C_papaya i.e we split the genome name with "_ " an combine first two words in list with "_ " an dchcek if it is present in species.txt. If it is present we get the first column name as organism name)
            4. defaultPos chrI:0-6239266
              From <http://igbquickload.org/quickload/C_papaya_Aug_2010/genome.txt>
            5. twoBitPath http://igbquickload.org/quickload/C_papaya_Aug_2010/C_papaya_Aug_2010.2bit

            we can get directly from annots.xml when reference = true or we can fetch from genome version folder
            http://igbquickload.org/quickload/C_papaya_Aug_2010/

            1. Organism, defaultPos and 2 bit file are not mandatory
              It is required only when ucsc genome name is not matched with igbquickload name
            • trackDb.txt
            1. track Caricapapaya
            2. bigDataUrl C_papaya_Aug_2010.bed.gz
              From <http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml> (name)
            3. shortLabel Carica papaya gene models from Phytozome v7
              From <http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml>
            4. longLabel protein-coding gene models
              From <http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml>
            5. type bed
            6. visibility dense (hard coded)

            track hub : https://api.bitbucket.org/2.0/repositories/lorainelab_udaya/testquickload/src/main/trackHub1/hub.txt

            The above track hub did not work since i need to convert bigdataurl to bed

            Show
            uchinta Udaya Chinta (Inactive) added a comment - - edited Created an example for POC from existing quickload: hub.txt: hub IGBQuickLoad shortLabel IGB QUICK LOAD longLabel Generated from IGB QuickLoad genomesFile genomes.txt email myEmail@address genomes.txt genome C_papaya_Aug_2010 C_papaya_Aug_2010 is igb quickload name but we need to assign ucsc genome name. To get ucsc genome name we need to check in synomyns.txt. If it is not in synomyns.txt then we assign same igb quickload name. trackDb C_papaya_Aug_2010/trackDb.txt organism Carica papaya (we get the organism from species.txt by checking with C_papaya i.e we split the genome name with "_ " an combine first two words in list with "_ " an dchcek if it is present in species.txt. If it is present we get the first column name as organism name) defaultPos chrI:0-6239266 From < http://igbquickload.org/quickload/C_papaya_Aug_2010/genome.txt > twoBitPath http://igbquickload.org/quickload/C_papaya_Aug_2010/C_papaya_Aug_2010.2bit we can get directly from annots.xml when reference = true or we can fetch from genome version folder http://igbquickload.org/quickload/C_papaya_Aug_2010/ Organism, defaultPos and 2 bit file are not mandatory It is required only when ucsc genome name is not matched with igbquickload name trackDb.txt track Caricapapaya bigDataUrl C_papaya_Aug_2010.bed.gz From < http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml > (name) shortLabel Carica papaya gene models from Phytozome v7 From < http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml > longLabel protein-coding gene models From < http://igbquickload.org/quickload/C_papaya_Aug_2010/annots.xml > type bed visibility dense (hard coded) track hub : https://api.bitbucket.org/2.0/repositories/lorainelab_udaya/testquickload/src/main/trackHub1/hub.txt The above track hub did not work since i need to convert bigdataurl to bed
            Hide
            uchinta Udaya Chinta (Inactive) added a comment -

            closing this ticket since investigation is completed. Creating another ticket for POC.
            Cc: Dr.Nowlan Freese

            Show
            uchinta Udaya Chinta (Inactive) added a comment - closing this ticket since investigation is completed. Creating another ticket for POC. Cc: Dr. Nowlan Freese

              People

              • Assignee:
                uchinta Udaya Chinta (Inactive)
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: