Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4206

Add jojoba (Simmondsia chinensis) genome to IGB

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: Dr. Kent Chapman, a professor from the University of North Texas, attended my talk at the SS-ASPB meeting earlier this year. He's followed up with me after the conference to request that the jojoba genome be added to IGB. We recently integrated GenArk, a genome-rich datasource, with IGB, but they don't yet host this genome, either.

      Task: Add the jojoba (Simmondsia chinensis) genome to IGB. Kent co-authored a paper describing this new genome assembly which I'll link below. The genome assembly and meta-data has been deposited in the Beijing Institute of Genomics (BIG) data center under the accession no. GWHAASQ00000000.

      Link to the paper: https://doi.org/10.1126/sciadv.aay3240
      Link to the genome/annotations: https://ngdc.cncb.ac.cn/gwh/Assembly/486/show

        Attachments

          Activity

          Hide
          nfreese Nowlan Freese added a comment -

          Tested on IGB 10.1.0 release. Able to load annotations and sequence. Synonyms appear correctly. Linkout goes to Simmondsia chinensis quickload page.

          Closing ticket.

          Show
          nfreese Nowlan Freese added a comment - Tested on IGB 10.1.0 release. Able to load annotations and sequence. Synonyms appear correctly. Linkout goes to Simmondsia chinensis quickload page. Closing ticket.
          Hide
          ann.loraine Ann Loraine added a comment -

          Updated data is deployed to Quickload servers hosted at RENCI and UNC Charlotte.
          Ready for final testing.

          Show
          ann.loraine Ann Loraine added a comment - Updated data is deployed to Quickload servers hosted at RENCI and UNC Charlotte. Ready for final testing.
          Hide
          nfreese Nowlan Freese added a comment - - edited

          Looks good, able to load annotations and sequence. Synonyms show up correctly.

          Ann Loraine - Ready for deployment to Quickload servers.

          Show
          nfreese Nowlan Freese added a comment - - edited Looks good, able to load annotations and sequence. Synonyms show up correctly. Ann Loraine - Ready for deployment to Quickload servers.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          The Simmondsia chinensis genome has been pushed to the SVN repo.

          Ready for final review!

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - The Simmondsia chinensis genome has been pushed to the SVN repo. Ready for final review!
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've updated the annots.xml file locally and tested that change. I'm able to load sequence in IGB now.

          Ann Loraine, could you please restart the svn server? I will check in my changes once that's done.

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've updated the annots.xml file locally and tested that change. I'm able to load sequence in IGB now. Ann Loraine , could you please restart the svn server? I will check in my changes once that's done.
          Hide
          nfreese Nowlan Freese added a comment -

          Testing the S chinensis quickload found in the Google Drive link.

          Only issue I could find was that the annots.xml "name" field for the reference sequence file is pointing at a gzipped fasta file on the web and I was unable to load the sequence in IGB. It should be pointing at the 2bit file.

          Synonyms/species look good.
          Gene models look good.
          Header markdown looks good.

          Show
          nfreese Nowlan Freese added a comment - Testing the S chinensis quickload found in the Google Drive link. Only issue I could find was that the annots.xml "name" field for the reference sequence file is pointing at a gzipped fasta file on the web and I was unable to load the sequence in IGB. It should be pointing at the 2bit file. Synonyms/species look good. Gene models look good. Header markdown looks good.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment - - edited

          I attempted to add more information to the BED file by converting it to BED14 format via the ucsctoBedDetail.py script from the GenomeSource Repo. Here's the code I used:

          ~/Documents/Repos/genomesource/ucscToBedDetail.py -a ../../3999.gene2accession.txt -g ../../3999.gene_info.txt ~/Documents/Repos/quickload/S_chinensis_Apr_2019/S_chinensis_Apr_2019.bed.gz S_chinensis.bed
          

          I was running into an error here that seemed to be an issue with the script. However, it turned out to be a Python version issue. To switch to an earlier version of Python, I used a tool provided by Homebrew called pyenv:

          pyenv global 2.7.18
          

          Then I was able to run the script without issue.

          No additional information was available for the BED file so the .zip file above should still contain the right files to be tested.

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - - edited I attempted to add more information to the BED file by converting it to BED14 format via the ucsctoBedDetail.py script from the GenomeSource Repo. Here's the code I used: ~/Documents/Repos/genomesource/ucscToBedDetail.py -a ../../3999.gene2accession.txt -g ../../3999.gene_info.txt ~/Documents/Repos/quickload/S_chinensis_Apr_2019/S_chinensis_Apr_2019.bed.gz S_chinensis.bed I was running into an error here that seemed to be an issue with the script. However, it turned out to be a Python version issue. To switch to an earlier version of Python, I used a tool provided by Homebrew called pyenv : pyenv global 2.7.18 Then I was able to run the script without issue. No additional information was available for the BED file so the .zip file above should still contain the right files to be tested.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at:
          Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > S_chinensis.zip
          Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

          Ready for review! Please note that I'm not confident in the annots.xml file as this is the first time I've added a genome that's not from NCBI or UCSC.

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - I've placed a zipped version of the new quickload folder in Google Drive for the reviewer to take a look at: Path: research-big-lorainelab > IGB Project Documentation and Plans > IGB Genomes > S_chinensis.zip Link: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link Ready for review! Please note that I'm not confident in the annots.xml file as this is the first time I've added a genome that's not from NCBI or UCSC.
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Below is an outline of the steps I followed to create the Simmondsia chinensis Quickload:
          1. Convert genome .fna to .2bit

          ./faToTwoBit GWHAASQ00000000.genome.fasta S_chinensis_Apr_2019.2bit
          

          2. Create genome.txt

          ./twoBitInfo S_chinensis_Apr_2019.2bit genome.txt 
          

          3. Convert gene models from .gff to .bed

          ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g GWHAASQ00000000.gff -b S_chinensis_Apr_2019.bed
          

          4. Sort, gzip, and tabix the .bed file

          sort -k1,1 -k2,2n S_chinensis_Apr_2019.bed | bgzip > S_chinensis_Apr_2019.bed.gz
          tabix -0 -s 1 -b 2 -e 3 S_chinensis_Apr_2019.bed.gz
          

          5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

          6. Create annots.xml and add _S_chinensis_ to contents.txt and .htaccess

          mkdir S_chinensis_Apr_2019
          cp V_cardui_Feb_2021/annots.xml S_chinensis_Apr_2019
          cp H_vulgaris_Apr_2024/HEADER.md S_chinensis_Apr_2019
          nano contents.txt 
          nano .htaccess 
          nano synonyms.txt 
          nano species.txt 
          
          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Below is an outline of the steps I followed to create the Simmondsia chinensis Quickload: 1. Convert genome .fna to .2bit ./faToTwoBit GWHAASQ00000000.genome.fasta S_chinensis_Apr_2019.2bit 2. Create genome.txt ./twoBitInfo S_chinensis_Apr_2019.2bit genome.txt 3. Convert gene models from .gff to .bed ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g GWHAASQ00000000.gff -b S_chinensis_Apr_2019.bed 4. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n S_chinensis_Apr_2019.bed | bgzip > S_chinensis_Apr_2019.bed.gz tabix -0 -s 1 -b 2 -e 3 S_chinensis_Apr_2019.bed.gz 5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 6. Create annots.xml and add _S_chinensis_ to contents.txt and .htaccess mkdir S_chinensis_Apr_2019 cp V_cardui_Feb_2021/annots.xml S_chinensis_Apr_2019 cp H_vulgaris_Apr_2024/HEADER.md S_chinensis_Apr_2019 nano contents.txt nano .htaccess nano synonyms.txt nano species.txt
          Hide
          pkulzer Paige Kulzer (Inactive) added a comment -

          Kent's email from March 30th, 2025:

          Paige, Thank you again for your excellent presentation about your integrated genomics resources. If you could include jojoba (Simmondsia chinensis) in your database and web interface, that might be useful.

          I think everything is referenced here in this paper and/or accessible online or through supplemental materials. If you need something else, let me know.

          https://www.science.org/doi/10.1126/sciadv.aay3240

          Best,
          Kent

          Kent D. Chapman, Ph.D.
          Regents Professor of Biochemistry
          Member, BioDiscovery Institute

          University of North Texas
          Department of Biological Sciences
          1155 Union Circle #305220
          Denton, TX 76203-5017
          +1-940-565-2969 (office)
          +1-940-300-6961 (cell)
          https://bdi.unt.edu/kent-chapman

          Show
          pkulzer Paige Kulzer (Inactive) added a comment - Kent's email from March 30th, 2025: Paige, Thank you again for your excellent presentation about your integrated genomics resources. If you could include jojoba (Simmondsia chinensis) in your database and web interface, that might be useful. I think everything is referenced here in this paper and/or accessible online or through supplemental materials. If you need something else, let me know. https://www.science.org/doi/10.1126/sciadv.aay3240 Best, Kent Kent D. Chapman, Ph.D. Regents Professor of Biochemistry Member, BioDiscovery Institute University of North Texas Department of Biological Sciences 1155 Union Circle #305220 Denton, TX 76203-5017 +1-940-565-2969 (office) +1-940-300-6961 (cell) https://bdi.unt.edu/kent-chapman

            People

            • Assignee:
              pkulzer Paige Kulzer (Inactive)
              Reporter:
              pkulzer Paige Kulzer (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: