Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3894

Add the latest version of the Oryzias latipes genome to IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: IGB currently hosts the 2005 version of the Oryzias latipes genome via UCSC's REST API, but a new version was released in 2017 that can be added to IGB.

      Task: Add the 2017 version of the Oryzias latipes genome and annotation to IGB.

      Oryzias latipes (ASM223467v1) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002234675.1/

        Attachments

          Activity

          pkulzer Paige Kulzer (Inactive) created issue -
          pkulzer Paige Kulzer (Inactive) made changes -
          Field Original Value New Value
          Epic Link IGBF-3823 [ 23122 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Description Situation: IGB currently hosts the 2005 version of the Oryzias latipes genome, but a new version was released in 2017 that can be added to IGB.

          Task: Add the 2017 version of the Oryzias latipes genome and annotation to IGB.

          Oryzias latipes (ASM223467v1) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002234675.1/
          Situation: IGB currently hosts the 2005 version of the Oryzias latipes genome via UCSC's REST API, but a new version was released in 2017 that can be added to IGB.

          Task: Add the 2017 version of the Oryzias latipes genome and annotation to IGB.

          Oryzias latipes (ASM223467v1) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002234675.1/
          pkulzer Paige Kulzer (Inactive) made changes -
          Sprint Fall 2 [ 203 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Rank Ranked higher
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Below is an outline of the steps I followed to create the Oryzias latipes Quickload:

          1. Use wget to obtain the .2bit file from the UCSC Genome Bioinformatics website, then rename it

          brew install wget
          wget https://hgdownload.soe.ucsc.edu/hubs/GCF/002/234/675/GCF_002234675.1/GCF_002234675.1.2bit
          mv GCF_002234675.1.2bit O_latipes_Jul_2017.2bit
          

          2. Create genome.txt

          ./twoBitInfo O_latipes_Jul_2017.2bit genome.txt
          

          3. Use Oryzias latipes' taxID (8090) to get the information we need from gene2accession.gz and gene_info.gz to create the BED14 file in a later step

          gunzip -c gene2accession.gz | grep '^8090\t' > 8090.gene2accession.txt
          gunzip -c gene_info.gz | grep '^8090\t' > 8090.gene_info.txt
          

          4. Set up Python2 (if not already done so) in order to run the script in the next step

          brew install pyenv
          pyenv install 2.7.18
          

          In your .bash_profile (or zshell/zshrc) add the following:

          ##Set path for pyenv
          export PYENV_ROOT="$HOME/.pyenv"
          [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
          eval "$(pyenv init -)"

          Then run this:

          pyenv shell 2.7.18
          

          5. Download the BED file from UCSC's table browser (Link: https://genome.ucsc.edu/cgi-bin/hgTables), then create the BED14 file using the following code:

          cd ~/Documents/Repos/genomesource 
          ./ucscToBedDetail.py -a ~/Downloads/8090.gene2accession.txt -g ~/Downloads/8090.gene_info.txt ~/Downloads/O_latipes_refGene.bed.gz ~/Downloads/O_latipes_Jul_2017_refGene.bed
          

          6. Sort, gzip, and tabix the BED14 file

          cd ~/Downloads/
          sort -k1,1 -k2,2n O_latipes_Jul_2017_refGene.bed | bgzip > O_latipes_Jul_2017_refGene.bed.gz
          tabix -0 -s 1 -b 2 -e 3 O_latipes_Jul_2017_refGene.bed.gz
          

          7. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

          8. Create annots.xml, add O_latipes_Jul_2017 to contents.txt and .htaccess, and create HEADER.md

          cd ~/Documents/Repos/quickload
          svn mkdir O_latipes_Jul_2017
          svn cp H_vulgaris_Apr_2024/annots.xml O_latipes_Jul_2017
          nano O_latipes_Jul_2017/annots.xml
          nano contents.txt
          nano .htaccess
          ../genomesource/writeQuickLoadHeaderUCSC.py O_latipes_Jul_2017 > O_latipes_Jul_2017/HEADER.md
          
          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Below is an outline of the steps I followed to create the Oryzias latipes Quickload: 1. Use wget to obtain the .2bit file from the UCSC Genome Bioinformatics website, then rename it brew install wget wget https: //hgdownload.soe.ucsc.edu/hubs/GCF/002/234/675/GCF_002234675.1/GCF_002234675.1.2bit mv GCF_002234675.1.2bit O_latipes_Jul_2017.2bit 2. Create genome.txt ./twoBitInfo O_latipes_Jul_2017.2bit genome.txt 3. Use Oryzias latipes ' taxID (8090) to get the information we need from gene2accession.gz and gene_info.gz to create the BED14 file in a later step gunzip -c gene2accession.gz | grep '^8090\t' > 8090.gene2accession.txt gunzip -c gene_info.gz | grep '^8090\t' > 8090.gene_info.txt 4. Set up Python2 (if not already done so) in order to run the script in the next step brew install pyenv pyenv install 2.7.18 In your .bash_profile (or zshell/zshrc) add the following: ##Set path for pyenv export PYENV_ROOT="$HOME/.pyenv" [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH" eval "$(pyenv init -)" Then run this: pyenv shell 2.7.18 5. Download the BED file from UCSC's table browser (Link: https://genome.ucsc.edu/cgi-bin/hgTables ), then create the BED14 file using the following code: cd ~/Documents/Repos/genomesource ./ucscToBedDetail.py -a ~/Downloads/8090.gene2accession.txt -g ~/Downloads/8090.gene_info.txt ~/Downloads/O_latipes_refGene.bed.gz ~/Downloads/O_latipes_Jul_2017_refGene.bed 6. Sort, gzip, and tabix the BED14 file cd ~/Downloads/ sort -k1,1 -k2,2n O_latipes_Jul_2017_refGene.bed | bgzip > O_latipes_Jul_2017_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 O_latipes_Jul_2017_refGene.bed.gz 7. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 8. Create annots.xml, add O_latipes_Jul_2017 to contents.txt and .htaccess, and create HEADER.md cd ~/Documents/Repos/quickload svn mkdir O_latipes_Jul_2017 svn cp H_vulgaris_Apr_2024/annots.xml O_latipes_Jul_2017 nano O_latipes_Jul_2017/annots.xml nano contents.txt nano .htaccess ../genomesource/writeQuickLoadHeaderUCSC.py O_latipes_Jul_2017 > O_latipes_Jul_2017/HEADER.md
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment - - edited

          I've placed a zipped version of the files created for this quickload in Google Drive for the reviewer to take a look at:

          Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > O_latipes.zip
          Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - - edited I've placed a zipped version of the files created for this quickload in Google Drive for the reviewer to take a look at: Path : research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > O_latipes.zip Link : https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Rank Ranked higher
          nfreese Nowlan Freese made changes -
          Sprint Fall 2 [ 203 ] Fall 3 [ 204 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 3 [ 204 ] Fall 4 [ 205 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 4 [ 205 ] Fall 5 [ 206 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 5 [ 206 ] Fall 6 [ 207 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 6 [ 207 ] Fall 7 [ 208 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Rank Ranked higher
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq".

          Let me know if I've updated all of these files correctly!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq". Let me know if I've updated all of these files correctly!
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          nfreese Nowlan Freese made changes -
          Attachment species.txt [ 18583 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Testing:

          • I first thought we don't need a species.txt as the released version of IGB has O_latipes in its species.txt. However, there is a mistake in the O_latipes species.txt line in the released version of IGB that will need to be fixed in IGBF-4006. In the meantime, please add the attached species.txt when submitting to the SVN repo.
          • I would like to include the name medaka in the HEADER.md as the NCBI page includes the name medaka.

          Everything else looks good, data load, sequence loads.

          Show
          nfreese Nowlan Freese added a comment - Testing: I first thought we don't need a species.txt as the released version of IGB has O_latipes in its species.txt. However, there is a mistake in the O_latipes species.txt line in the released version of IGB that will need to be fixed in IGBF-4006 . In the meantime, please add the attached species.txt when submitting to the SVN repo. I would like to include the name medaka in the HEADER.md as the NCBI page includes the name medaka. Everything else looks good, data load, sequence loads.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've added the name "Medaka" to HEADER.md and contents.txt. I've also replaced species.txt in the zip file with the file you attached.

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've added the name "Medaka" to HEADER.md and contents.txt. I've also replaced species.txt in the zip file with the file you attached.
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Testing: Everything looks good.

          Show
          nfreese Nowlan Freese added a comment - Testing: Everything looks good.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The Oryzias latipes genome has been pushed to the SVN repo.

          Ready for final review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The Oryzias latipes genome has been pushed to the SVN repo. Ready for final review!
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          Hide
          ann.loraine Ann Loraine added a comment -

          I have deployed the latest copy of quickload repository to:

          RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary)
          UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup)
          To test:

          • launch IGB and visit each new genome version (see above)
          • visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself
          • within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow
          • check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          Show
          ann.loraine Ann Loraine added a comment - I have deployed the latest copy of quickload repository to: RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary) UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup) To test: launch IGB and visit each new genome version (see above) visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          nfreese Nowlan Freese made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Tested following instructions above. Everything looks good.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - Tested following instructions above. Everything looks good. Closing ticket.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

            People

            • Assignee:
              pkulzer Paige Kulzer (Inactive)
              Reporter:
              pkulzer Paige Kulzer (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: