Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Add the Hydra vulgaris genome and annotation to IGB. Current Hydra vulgaris genome version provided by ensembl: Hydra_105_v3 (Feb 2022).

      Hydra vulgaris (HydraT2T_AEP)(Apr 2024) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_038396675.1/

        Attachments

          Activity

          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Below is an outline of the steps I followed to create the Hydra vulgaris Quickload:
          1. Convert genome .fna to .2bit

          gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz
          ./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit
          

          2. Create genome.txt

          ./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt
          

          3. Get gene models from NCBI (.gff), then convert .gff to .bed

          cd ~/Documents/Repos/genomesource
          ./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed
          

          4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step

          cd ~/Downloads
          gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt
          

          5. Sort, gzip, and tabix the .bed file

          sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz
          tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz
          

          6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

          7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess

          cd ~/Documents/Repos/quickload
          svn mkdir H_vulgaris_Apr_2024
          svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024
          nano H_vulgaris_Apr_2024/annots.xml
          nano contents.txt
          nano .htaccess
          
          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Below is an outline of the steps I followed to create the Hydra vulgaris Quickload: 1. Convert genome .fna to .2bit gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz ./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit 2. Create genome.txt ./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt 3. Get gene models from NCBI (.gff), then convert .gff to .bed cd ~/Documents/Repos/genomesource ./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed 4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step cd ~/Downloads gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt 5. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz 6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess cd ~/Documents/Repos/quickload svn mkdir H_vulgaris_Apr_2024 svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024 nano H_vulgaris_Apr_2024/annots.xml nano contents.txt nano .htaccess
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment - - edited

          I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at:
          Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > H_vulgaris.zip
          Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

          Question for the reviewer: Do I need to manually edit the HEADER.md file for genomes like this one that were pulled from NCBI rather than UCSC?

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - - edited I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at: Path : research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > H_vulgaris.zip Link : https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link Question for the reviewer: Do I need to manually edit the HEADER.md file for genomes like this one that were pulled from NCBI rather than UCSC?
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq".

          Let me know if I've updated all of these files correctly!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq". Let me know if I've updated all of these files correctly!
          Hide
          nfreese Nowlan Freese added a comment -

          Testing:

          • The synonyms.txt has the wrong date (has Feb_2022 instead of Apr_2024) so it's not working correctly.
          • The 14th column of the bed file has a bunch of %2C which should be commas. Not sure why, but would probably be good to replace.
          Show
          nfreese Nowlan Freese added a comment - Testing: The synonyms.txt has the wrong date (has Feb_2022 instead of Apr_2024) so it's not working correctly. The 14th column of the bed file has a bunch of %2C which should be commas. Not sure why, but would probably be good to replace.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Good catch! I updated the date in synonyms.txt and replaced all instances of "%2C" with a comma. It appears this weird syntax was already present in the GFF file I downloaded from NCBI.

          We also decided to use the common name denoted by NCBI rather than what we find on Google, so I've updated the common name to "Swiftwater hydra" in HEADER.md, contents.txt, and species.txt.

          Ready for review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Good catch! I updated the date in synonyms.txt and replaced all instances of "%2C" with a comma. It appears this weird syntax was already present in the GFF file I downloaded from NCBI. We also decided to use the common name denoted by NCBI rather than what we find on Google, so I've updated the common name to "Swiftwater hydra" in HEADER.md, contents.txt, and species.txt. Ready for review!
          Hide
          nfreese Nowlan Freese added a comment -

          Testing: everything looks good.

          Show
          nfreese Nowlan Freese added a comment - Testing: everything looks good.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with

          svn: E200029: Commit failed (details follow):
          svn: E200029: could not begin a transaction
          

          And when I try to update my working copy with "svn update", svn is responding with

          svn: E200029: Couldn't perform atomic initialization
          

          Ann Loraine, could you restart the svn site and reattach the virtual hard drive storing the data like you did for IGBF-3748?

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with svn: E200029: Commit failed (details follow): svn: E200029: could not begin a transaction And when I try to update my working copy with "svn update", svn is responding with svn: E200029: Couldn't perform atomic initialization Ann Loraine , could you restart the svn site and reattach the virtual hard drive storing the data like you did for IGBF-3748 ?
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo.

          Ready for final review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo. Ready for final review!
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          I have deployed the latest copy of quickload repository to:

          RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary)
          UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup)
          To test:

          • launch IGB and visit each new genome version (see above)
          • visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself
          • within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow
          • check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          Show
          ann.loraine Ann Loraine added a comment - - edited I have deployed the latest copy of quickload repository to: RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary) UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup) To test: launch IGB and visit each new genome version (see above) visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          Hide
          nfreese Nowlan Freese added a comment -

          Tested following instructions above. Everything looks good.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - Tested following instructions above. Everything looks good. Closing ticket.

            People

            • Assignee:
              pkulzer Paige Kulzer (Inactive)
              Reporter:
              pkulzer Paige Kulzer (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: