[IGBF-4206] Add jojoba (Simmondsia chinensis) genome to IGB - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
1.5
Epic Link:
Improve IGB for users
Sprint:
Fall 4, Fall 5

Description

Situation: Dr. Kent Chapman, a professor from the University of North Texas, attended my talk at the SS-ASPB meeting earlier this year. He's followed up with me after the conference to request that the jojoba genome be added to IGB. We recently integrated GenArk, a genome-rich datasource, with IGB, but they don't yet host this genome, either.

Task: Add the jojoba (Simmondsia chinensis) genome to IGB. Kent co-authored a paper describing this new genome assembly which I'll link below. The genome assembly and meta-data has been deposited in the Beijing Institute of Genomics (BIG) data center under the accession no. GWHAASQ00000000.

Link to the paper: https://doi.org/10.1126/sciadv.aay3240
Link to the genome/annotations: https://ngdc.cncb.ac.cn/gwh/Assembly/486/show

Attachments

Activity

Descending order - Click to sort in ascending order

Hide

Permalink

Nowlan Freese added a comment - 24/Oct/25 3:17 PM

Tested on IGB 10.1.0 release. Able to load annotations and sequence. Synonyms appear correctly. Linkout goes to Simmondsia chinensis quickload page.

Closing ticket.

Show

Nowlan Freese added a comment - 24/Oct/25 3:17 PM Tested on IGB 10.1.0 release. Able to load annotations and sequence. Synonyms appear correctly. Linkout goes to Simmondsia chinensis quickload page. Closing ticket.

Hide

Permalink

Ann Loraine added a comment - 23/Oct/25 2:08 PM

Updated data is deployed to Quickload servers hosted at RENCI and UNC Charlotte.
Ready for final testing.

Show

Ann Loraine added a comment - 23/Oct/25 2:08 PM Updated data is deployed to Quickload servers hosted at RENCI and UNC Charlotte. Ready for final testing.

Hide

Permalink

Nowlan Freese added a comment - 14/Oct/25 10:21 AM - edited

Looks good, able to load annotations and sequence. Synonyms show up correctly.

Ann Loraine - Ready for deployment to Quickload servers.

Show

Nowlan Freese added a comment - 14/Oct/25 10:21 AM - edited Looks good, able to load annotations and sequence. Synonyms show up correctly. Ann Loraine - Ready for deployment to Quickload servers.

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 13/Oct/25 10:34 AM

The Simmondsia chinensis genome has been pushed to the SVN repo.

Ready for final review!

Show

Paige Kulzer (Inactive) added a comment - 13/Oct/25 10:34 AM The Simmondsia chinensis genome has been pushed to the SVN repo. Ready for final review!

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 06/Oct/25 9:52 AM

I've updated the annots.xml file locally and tested that change. I'm able to load sequence in IGB now.

Ann Loraine, could you please restart the svn server? I will check in my changes once that's done.

Show

Paige Kulzer (Inactive) added a comment - 06/Oct/25 9:52 AM I've updated the annots.xml file locally and tested that change. I'm able to load sequence in IGB now. Ann Loraine , could you please restart the svn server? I will check in my changes once that's done.

Hide

Permalink

Nowlan Freese added a comment - 02/Oct/25 3:21 PM

Testing the S chinensis quickload found in the Google Drive link.

Only issue I could find was that the annots.xml "name" field for the reference sequence file is pointing at a gzipped fasta file on the web and I was unable to load the sequence in IGB. It should be pointing at the 2bit file.

Synonyms/species look good.
Gene models look good.
Header markdown looks good.

Show

Nowlan Freese added a comment - 02/Oct/25 3:21 PM Testing the S chinensis quickload found in the Google Drive link. Only issue I could find was that the annots.xml "name" field for the reference sequence file is pointing at a gzipped fasta file on the web and I was unable to load the sequence in IGB. It should be pointing at the 2bit file. Synonyms/species look good. Gene models look good. Header markdown looks good.

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 01/Oct/25 11:18 AM - edited

I attempted to add more information to the BED file by converting it to BED14 format via the ucsctoBedDetail.py script from the GenomeSource Repo. Here's the code I used:

~/Documents/Repos/genomesource/ucscToBedDetail.py -a ../../3999.gene2accession.txt -g ../../3999.gene_info.txt ~/Documents/Repos/quickload/S_chinensis_Apr_2019/S_chinensis_Apr_2019.bed.gz S_chinensis.bed

I was running into an error here that seemed to be an issue with the script. However, it turned out to be a Python version issue. To switch to an earlier version of Python, I used a tool provided by Homebrew called pyenv:

pyenv global 2.7.18

Then I was able to run the script without issue.

No additional information was available for the BED file so the .zip file above should still contain the right files to be tested.

Show

Paige Kulzer (Inactive) added a comment - 01/Oct/25 11:18 AM - edited I attempted to add more information to the BED file by converting it to BED14 format via the ucsctoBedDetail.py script from the GenomeSource Repo. Here's the code I used: ~/Documents/Repos/genomesource/ucscToBedDetail.py -a ../../3999.gene2accession.txt -g ../../3999.gene_info.txt ~/Documents/Repos/quickload/S_chinensis_Apr_2019/S_chinensis_Apr_2019.bed.gz S_chinensis.bed I was running into an error here that seemed to be an issue with the script. However, it turned out to be a Python version issue. To switch to an earlier version of Python, I used a tool provided by Homebrew called pyenv : pyenv global 2.7.18 Then I was able to run the script without issue. No additional information was available for the BED file so the .zip file above should still contain the right files to be tested.

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 01/Oct/25 9:54 AM

I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at:
Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > S_chinensis.zip
Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

Ready for review! Please note that I'm not confident in the annots.xml file as this is the first time I've added a genome that's not from NCBI or UCSC.

Show

Paige Kulzer (Inactive) added a comment - 01/Oct/25 9:54 AM I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at: Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > S_chinensis.zip Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link Ready for review! Please note that I'm not confident in the annots.xml file as this is the first time I've added a genome that's not from NCBI or UCSC.

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 01/Oct/25 9:46 AM

Below is an outline of the steps I followed to create the Simmondsia chinensis Quickload:
1. Convert genome .fna to .2bit

./faToTwoBit GWHAASQ00000000.genome.fasta S_chinensis_Apr_2019.2bit

2. Create genome.txt

./twoBitInfo S_chinensis_Apr_2019.2bit genome.txt

3. Convert gene models from .gff to .bed

~/Documents/Repos/genomesource/gff3ToBedDetail.py -g GWHAASQ00000000.gff -b S_chinensis_Apr_2019.bed

4. Sort, gzip, and tabix the .bed file

sort -k1,1 -k2,2n S_chinensis_Apr_2019.bed | bgzip > S_chinensis_Apr_2019.bed.gz
tabix -0 -s 1 -b 2 -e 3 S_chinensis_Apr_2019.bed.gz

5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

6. Create annots.xml and add _S_chinensis_ to contents.txt and .htaccess

mkdir S_chinensis_Apr_2019
cp V_cardui_Feb_2021/annots.xml S_chinensis_Apr_2019
cp H_vulgaris_Apr_2024/HEADER.md S_chinensis_Apr_2019
nano contents.txt 
nano .htaccess 
nano synonyms.txt 
nano species.txt

Show

Paige Kulzer (Inactive) added a comment - 01/Oct/25 9:46 AM Below is an outline of the steps I followed to create the Simmondsia chinensis Quickload: 1. Convert genome .fna to .2bit ./faToTwoBit GWHAASQ00000000.genome.fasta S_chinensis_Apr_2019.2bit 2. Create genome.txt ./twoBitInfo S_chinensis_Apr_2019.2bit genome.txt 3. Convert gene models from .gff to .bed ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g GWHAASQ00000000.gff -b S_chinensis_Apr_2019.bed 4. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n S_chinensis_Apr_2019.bed | bgzip > S_chinensis_Apr_2019.bed.gz tabix -0 -s 1 -b 2 -e 3 S_chinensis_Apr_2019.bed.gz 5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 6. Create annots.xml and add _S_chinensis_ to contents.txt and .htaccess mkdir S_chinensis_Apr_2019 cp V_cardui_Feb_2021/annots.xml S_chinensis_Apr_2019 cp H_vulgaris_Apr_2024/HEADER.md S_chinensis_Apr_2019 nano contents.txt nano .htaccess nano synonyms.txt nano species.txt

Hide

Permalink

Paige Kulzer (Inactive) added a comment - 08/May/25 10:03 AM

Kent's email from March 30th, 2025:

Paige, Thank you again for your excellent presentation about your integrated genomics resources. If you could include jojoba (Simmondsia chinensis) in your database and web interface, that might be useful.

I think everything is referenced here in this paper and/or accessible online or through supplemental materials. If you need something else, let me know.

https://www.science.org/doi/10.1126/sciadv.aay3240

Best,
Kent

Kent D. Chapman, Ph.D.
Regents Professor of Biochemistry
Member, BioDiscovery Institute

University of North Texas
Department of Biological Sciences
1155 Union Circle #305220
Denton, TX 76203-5017
+1-940-565-2969 (office)
+1-940-300-6961 (cell)
https://bdi.unt.edu/kent-chapman

Show

Paige Kulzer (Inactive) added a comment - 08/May/25 10:03 AM Kent's email from March 30th, 2025: Paige, Thank you again for your excellent presentation about your integrated genomics resources. If you could include jojoba (Simmondsia chinensis) in your database and web interface, that might be useful. I think everything is referenced here in this paper and/or accessible online or through supplemental materials. If you need something else, let me know. https://www.science.org/doi/10.1126/sciadv.aay3240 Best, Kent Kent D. Chapman, Ph.D. Regents Professor of Biochemistry Member, BioDiscovery Institute University of North Texas Department of Biological Sciences 1155 Union Circle #305220 Denton, TX 76203-5017 +1-940-565-2969 (office) +1-940-300-6961 (cell) https://bdi.unt.edu/kent-chapman

People

Assignee:

Paige Kulzer (Inactive)

Reporter:

Paige Kulzer (Inactive)

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

08/May/25 10:01 AM

Updated:

24/Oct/25 3:17 PM

Resolved:

24/Oct/25 3:17 PM