[IGBF-3890] Add Hydra vulgaris genome to IGB - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Add genomes requested during SDB
Sprint:
Fall 1, Fall 7

Description

Task: Add the Hydra vulgaris genome and annotation to IGB. Current Hydra vulgaris genome version provided by ensembl: Hydra_105_v3 (Feb 2022).

Hydra vulgaris (HydraT2T_AEP)(Apr 2024) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_038396675.1/

Attachments

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 12/Sep/24 9:40 AM

Below is an outline of the steps I followed to create the Hydra vulgaris Quickload:
1. Convert genome .fna to .2bit

gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz
./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit

2. Create genome.txt

./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt

3. Get gene models from NCBI (.gff), then convert .gff to .bed

cd ~/Documents/Repos/genomesource
./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed

4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step

cd ~/Downloads
gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt

5. Sort, gzip, and tabix the .bed file

sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz
tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz

6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess

cd ~/Documents/Repos/quickload
svn mkdir H_vulgaris_Apr_2024
svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024
nano H_vulgaris_Apr_2024/annots.xml
nano contents.txt
nano .htaccess

Show

Paige Kulzer (Inactive) added a comment - 12/Sep/24 9:40 AM Below is an outline of the steps I followed to create the Hydra vulgaris Quickload: 1. Convert genome .fna to .2bit gunzip GCF_038396675.1_HydraT2T_AEP_genomic.fna.gz ./faToTwoBit GCF_038396675.1_HydraT2T_AEP_genomic.fna H_vulgaris_Apr_2024.2bit 2. Create genome.txt ./twoBitInfo H_vulgaris_Apr_2024.2bit genome.txt 3. Get gene models from NCBI (.gff), then convert .gff to .bed cd ~/Documents/Repos/genomesource ./gff3ToBedDetail.py -g ~/Downloads/genomic.gff -b ~/Downloads/H_vulgaris_Apr_2024_refGene.bed 4. Check if UCSC has any information for this genome using its txid (NCBI:txid6087) and, since if it does, compare gene names/ID's to those present in the .bed file created in the previous step cd ~/Downloads gunzip -c gene2accession.gz | grep '^6087\t' > 6087.gene2accession.txt 5. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n H_vulgaris_Apr_2024_refGene.bed | bgzip > H_vulgaris_Apr_2024_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 H_vulgaris_Apr_2024_refGene.bed.gz 6. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 7. Create annots.xml and add _H_vulgaris_ to contents.txt and .htaccess cd ~/Documents/Repos/quickload svn mkdir H_vulgaris_Apr_2024 svn cp A_gambiae_Oct_2006/annots.xml H_vulgaris_Apr_2024 nano H_vulgaris_Apr_2024/annots.xml nano contents.txt nano .htaccess

4 older comments

Hide

Permalink

Nowlan Freese added a comment - 13/Dec/24 1:29 PM

Testing: everything looks good.

Show

Nowlan Freese added a comment - 13/Dec/24 1:29 PM Testing: everything looks good.

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 16/Dec/24 9:24 AM

The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with

svn: E200029: Commit failed (details follow):
svn: E200029: could not begin a transaction

And when I try to update my working copy with "svn update", svn is responding with

svn: E200029: Couldn't perform atomic initialization

Ann Loraine, could you restart the svn site and reattach the virtual hard drive storing the data like you did for ~~IGBF-3748~~?

Show

Paige Kulzer (Inactive) added a comment - 16/Dec/24 9:24 AM The subversion repository appears to be down which is preventing me from pushing this quickload to the SVN site. When I try to check-in my changes, svn is responding with svn: E200029: Commit failed (details follow): svn: E200029: could not begin a transaction And when I try to update my working copy with "svn update", svn is responding with svn: E200029: Couldn't perform atomic initialization Ann Loraine , could you restart the svn site and reattach the virtual hard drive storing the data like you did for IGBF-3748 ?

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 17/Dec/24 10:24 AM

The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo.

Ready for final review!

Show

Paige Kulzer (Inactive) added a comment - 17/Dec/24 10:24 AM The SVN site is back up and running, and the Hydra vulgaris genome has been pushed to the SVN repo. Ready for final review!

Hide

Permalink

Ann Loraine added a comment - 19/Dec/24 2:39 PM - edited

I have deployed the latest copy of quickload repository to:

RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary)
UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup)
To test:

launch IGB and visit each new genome version (see above)
visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself
within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow
check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)

Show

Ann Loraine added a comment - 19/Dec/24 2:39 PM - edited I have deployed the latest copy of quickload repository to: RENCI hosting - http://igbquickload-main.bioviz.org/quickload/ (primary) UNC Charlotte hosting - http://igbquickload.org/quickload/ (backup) To test: launch IGB and visit each new genome version (see above) visit the subdirectories for each genome (by following the links above) and check that there is text describing the genome and datasets visible in IGB itself within IGB Available Data section, click any "linkout" icons and make sure a Web page opens and that it goes to a place that describes the dataset somehow check that when the datasets load, they look OK - gene models should be boxes with lines connecting them, for instance, and the track labels should be readable and should make sense ("making sense" is a subjective of course! mainly we're looking for problems that could trip up a user and cause confusion.)

Hide

Permalink

Nowlan Freese added a comment - 20/Dec/24 9:45 AM

Tested following instructions above. Everything looks good.

Closing ticket.

Show

Nowlan Freese added a comment - 20/Dec/24 9:45 AM Tested following instructions above. Everything looks good. Closing ticket.

People

Assignee:

Paige Kulzer (Inactive)

Reporter:

Paige Kulzer (Inactive)

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

05/Sep/24 1:51 PM

Updated:

20/Dec/24 9:45 AM

Resolved:

20/Dec/24 9:45 AM