Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Add the Hydra vulgaris genome and annotation to IGB. Current Hydra vulgaris genome version provided by ensembl: Hydra_105_v3 (Feb 2022).

      Hydra vulgaris (HydraT2T_AEP)(Apr 2024) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_038396675.1/

        Attachments

          Activity

          pkulzer Paige Kulzer (Inactive) created issue -
          pkulzer Paige Kulzer (Inactive) made changes -
          Field Original Value New Value
          Epic Link IGBF-3823 [ 23122 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Below is an outline of the steps I followed to create the Hydra vulgaris Quickload:
          1. Convert genome .fna to .2bit

          gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz
          ./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit
          

          2. Create genome.txt

          ./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt
          

          3. Get gene models from NCBI (.gff), then convert .gff to .bed

          cd ~/Documents/Repos/genomesource
          ./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed
          

          4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step

          cd ~/Downloads
          gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt
          

          5. Sort, gzip, and tabix the .bed file

          sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz
          tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz
          

          6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

          7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess

          cd ~/Documents/Repos/quickload
          svn mkdir H_vulgaris_Apr_2024
          svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024
          nano H_vulgaris_Apr_2024/annots.xml
          nano contents.txt
          nano .htaccess
          
          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Below is an outline of the steps I followed to create the Hydra vulgaris Quickload: 1. Convert genome .fna to .2bit gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz ./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit 2. Create genome.txt ./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt 3. Get gene models from NCBI (.gff), then convert .gff to .bed cd ~/Documents/Repos/genomesource ./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed 4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step cd ~/Downloads gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt 5. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz 6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess cd ~/Documents/Repos/quickload svn mkdir H_vulgaris_Apr_2024 svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024 nano H_vulgaris_Apr_2024/annots.xml nano contents.txt nano .htaccess
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment - - edited

          I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at:
          Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > H_vulgaris.zip
          Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

          Question for the reviewer: Do I need to manually edit the HEADER.md file for genomes like this one that were pulled from NCBI rather than UCSC?

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - - edited I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at: Path : research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > H_vulgaris.zip Link : https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link Question for the reviewer: Do I need to manually edit the HEADER.md file for genomes like this one that were pulled from NCBI rather than UCSC?
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Sprint Fall 1 [ 202 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Description Task: Add the Hydra vulgaris genome and annotation to IGB.

          Hydra vulgaris (HydraT2T_AEP) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_038396675.1/
          Task: Add the _Hydra vulgaris_ genome and annotation to IGB. Current _Hydra vulgaris_ genome version provided by ensembl: Hydra_105_v3 (Feb 2022).

          _Hydra vulgaris_ (HydraT2T_AEP)(Apr 2024) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_038396675.1/
          ann.loraine Ann Loraine made changes -
          Sprint Fall 1 [ 202 ] Fall 1, Fall 2 [ 202, 203 ]
          ann.loraine Ann Loraine made changes -
          Rank Ranked higher
          pkulzer Paige Kulzer (Inactive) made changes -
          Rank Ranked higher
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Fall 2 [ 202, 203 ] Fall 1, Fall 3 [ 202, 204 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Fall 3 [ 202, 204 ] Fall 1, Fall 4 [ 202, 205 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Fall 4 [ 202, 205 ] Fall 1, Fall 5 [ 202, 206 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Fall 5 [ 202, 206 ] Fall 1, Fall 6 [ 202, 207 ]
          nfreese Nowlan Freese made changes -
          Sprint Fall 1, Fall 6 [ 202, 207 ] Fall 1, Fall 7 [ 202, 208 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Rank Ranked higher
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq".

          Let me know if I've updated all of these files correctly!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've edited the header to make it specific to NCBI rather than UCSC, added species.txt and synonyms.txt to the zip file, cut down contents.txt to only include this genome, and edited annots.txt to replace UCSC's "refGene" with NCBI's "refSeq". Let me know if I've updated all of these files correctly!
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Testing:

          • The synonyms.txt has the wrong date (has Feb_2022 instead of Apr_2024) so it's not working correctly.
          • The 14th column of the bed file has a bunch of %2C which should be commas. Not sure why, but would probably be good to replace.
          Show
          nfreese Nowlan Freese added a comment - Testing: The synonyms.txt has the wrong date (has Feb_2022 instead of Apr_2024) so it's not working correctly. The 14th column of the bed file has a bunch of %2C which should be commas. Not sure why, but would probably be good to replace.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status To-Do [ 10305 ] In Progress [ 3 ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Good catch! I updated the date in synonyms.txt and replaced all instances of "%2C" with a comma. It appears this weird syntax was already present in the GFF file I downloaded from NCBI.

          We also decided to use the common name denoted by NCBI rather than what we find on Google, so I've updated the common name to "Swiftwater hydra" in HEADER.md, contents.txt, and species.txt.

          Ready for review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Good catch! I updated the date in synonyms.txt and replaced all instances of "%2C" with a comma. It appears this weird syntax was already present in the GFF file I downloaded from NCBI. We also decided to use the common name denoted by NCBI rather than what we find on Google, so I've updated the common name to "Swiftwater hydra" in HEADER.md, contents.txt, and species.txt. Ready for review!
          pkulzer Paige Kulzer (Inactive) made changes -
          Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          nfreese Nowlan Freese made changes -
          Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Testing: everything looks good.

          Show
          nfreese Nowlan Freese added a comment - Testing: everything looks good.
          nfreese Nowlan Freese made changes -
          Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with

          svn: E200029: Commit failed (details follow):
          svn: E200029: could not begin a transaction
          

          And when I try to update my working copy with "svn update", svn is responding with

          svn: E200029: Couldn't perform atomic initialization
          

          Ann Loraine, could you restart the svn site and reattach the virtual hard drive storing the data like you did for IGBF-3748?

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with svn: E200029: Commit failed (details follow): svn: E200029: could not begin a transaction And when I try to update my working copy with "svn update", svn is responding with svn: E200029: Couldn't perform atomic initialization Ann Loraine , could you restart the svn site and reattach the virtual hard drive storing the data like you did for IGBF-3748 ?
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo.

          Ready for final review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo. Ready for final review!
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
          pkulzer Paige Kulzer (Inactive) made changes -
          Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
          Hide
          ann.loraine Ann Loraine added a comment - - edited

          I have deployed the latest copy of quickload repository to:

          RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary)
          UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup)
          To test:

          • launch IGB and visit each new genome version (see above)
          • visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself
          • within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow
          • check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          Show
          ann.loraine Ann Loraine added a comment - - edited I have deployed the latest copy of quickload repository to: RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary) UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup) To test: launch IGB and visit each new genome version (see above) visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)
          nfreese Nowlan Freese made changes -
          Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
          Hide
          nfreese Nowlan Freese added a comment -

          Tested following instructions above. Everything looks good.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - Tested following instructions above. Everything looks good. Closing ticket.
          nfreese Nowlan Freese made changes -
          Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
          nfreese Nowlan Freese made changes -
          Resolution Done [ 10000 ]
          Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

            People

            • Assignee:
              pkulzer Paige Kulzer (Inactive)
              Reporter:
              pkulzer Paige Kulzer (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: