Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3135

Add new tomato genome and annotations to IGB Quickload repository

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      https://www.nature.com/articles/s41586-022-04808-9

      Called SL5.0, it includes gene model annotations that document / predict alternative splicing forms.

      For this task, obtain the sequence and annotations and add them to IGB Quickload as a new version of the tomato genome.

      Data:

        Attachments

        1. ITAG5.0-1.png
          ITAG5.0-1.png
          140 kB
        2. ITAG5.0-2.png
          ITAG5.0-2.png
          36 kB
        3. ITAG5.0-4.png
          ITAG5.0-4.png
          116 kB
        4. ITAG5.0-5.png
          ITAG5.0-5.png
          66 kB
        5. ProtAnnot.paxml
          39 kB

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Obtaining copy of the IGB Quickload subversion repository with genome sequences and annotations, after installing subversion using "homebrew" on my Apple.
            Command to checkout the repository is:

            svn --username=aloraine co https://svn.bioviz.org/repos/genomes/quickload
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Obtaining copy of the IGB Quickload subversion repository with genome sequences and annotations, after installing subversion using "homebrew" on my Apple. Command to checkout the repository is: svn --username=aloraine co https: //svn.bioviz.org/repos/genomes/quickload
            Hide
            ann.loraine Ann Loraine added a comment -

            FYI to [~RobertReid]: Hi! I am setting up the new tomato genome release in IGB.

            Show
            ann.loraine Ann Loraine added a comment - FYI to [~RobertReid] : Hi! I am setting up the new tomato genome release in IGB.
            Hide
            ann.loraine Ann Loraine added a comment -

            Name for the new tomato genome:

            • S_lycopersicum_Jun_2022

            Provider genome assembly alias: SL5.0

            Name for the new tomato annotations:

            • ITAG5.0
            Show
            ann.loraine Ann Loraine added a comment - Name for the new tomato genome: S_lycopersicum_Jun_2022 Provider genome assembly alias: SL5.0 Name for the new tomato annotations: ITAG5.0
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Genome fasta file contains chromosomes named by numbers: 0 to 12.
            Protein gene annotation file uses these same numbers.

            Show
            ann.loraine Ann Loraine added a comment - - edited Genome fasta file contains chromosomes named by numbers: 0 to 12. Protein gene annotation file uses these same numbers.
            Hide
            ann.loraine Ann Loraine added a comment -

            No functional descriptions available for the new models in any of the files I looked at. Sent email to authors:

            to: huangsanwen@caas.cn, doug@qgg.au.dk, zhiwu.zhang@wsu.edu, shizhong.xu@ucr.edu, thomas.staedler@usys.ethz.ch, zf25@cornell.edu,

            cc: RobertReid@uncc.edu, nfreese@uncc.edu, zhouyao@caas.cn, zhangzhiyangcs@163.com, baozhigui@caas.cn

            Dear Drs. Huang, Speed, Xu, Städler, Fei, Mueller, and Zhang,

            Congratulations on your new publication "Graph pangenome captures missing heritability and empowers tomato breeding."

            I am very excited to use your new reference assembly (SL5.0) in my RNA-Seq data analysis projects!

            I am writing to ask your help:

            I am looking for the Gene Ontology annotations mentioned in the article, and also for human-friend text descriptions of the new gene models, if available.

            I looked on the Sol Genomics Web site and in the Supplemental Data files available from the journal Web site, but could not find any GO annotations or text descriptions of each gene (or mRNA transcript).

            Would you be able to send these to me?

            The text descriptions of gene models would be especially valuable for interpreting results from RNA-Seq and other experiments that produce big lists of "significant" genes.

            If you can provide descriptions for each model, I could include them in Integrated Genome Browser, a software tool my lab develops (https://bioviz.org). This software lets users quickly explore and analyze data from RNA-Seq, Chip-Seq and other genome-focused experiments.

            In IGB, if a user clicks on a gene model, they can see a text description of the gene model, if the annotation provider (yourself) has provided this.

            Users can even do keyword searches of the descriptions, as shown in the attached screen capture image from SL4.0. Including a short description of each gene model makes it much easier to understand and explore the results from RNA-Seq and other similar experiments.

            So, I hope you will be able to provide both a short text description for each gene model, along with the Gene Ontology annotations!

            Warm regards,

            Ann Loraine, Ph.D.
            Professor, Bioinformatics & Genomics
            Genome Visualization Lab
            College of Computing & Informatics
            University of North Carolina, Charlotte
            https://lorainelab.org
            https://bioviz.org

            Reply from Yao Zhou:

            Dear Ann,
            We now released the GO annotation of SL5.0 in our website (http://solomics.agis.org.cn/tomato/ftp/GO/). Unfortunately, we do not have a text descriptions of the new gene models.
            Naama, would you please help us upload the GO annotation to SGN website? Thanks!
            Best regards,
            Yao

            Show
            ann.loraine Ann Loraine added a comment - No functional descriptions available for the new models in any of the files I looked at. Sent email to authors: to: huangsanwen@caas.cn, doug@qgg.au.dk, zhiwu.zhang@wsu.edu, shizhong.xu@ucr.edu, thomas.staedler@usys.ethz.ch, zf25@cornell.edu, cc: RobertReid@uncc.edu, nfreese@uncc.edu, zhouyao@caas.cn, zhangzhiyangcs@163.com, baozhigui@caas.cn Dear Drs. Huang, Speed, Xu, Städler, Fei, Mueller, and Zhang, Congratulations on your new publication "Graph pangenome captures missing heritability and empowers tomato breeding." I am very excited to use your new reference assembly (SL5.0) in my RNA-Seq data analysis projects! I am writing to ask your help: I am looking for the Gene Ontology annotations mentioned in the article, and also for human-friend text descriptions of the new gene models, if available. I looked on the Sol Genomics Web site and in the Supplemental Data files available from the journal Web site, but could not find any GO annotations or text descriptions of each gene (or mRNA transcript). Would you be able to send these to me? The text descriptions of gene models would be especially valuable for interpreting results from RNA-Seq and other experiments that produce big lists of "significant" genes. If you can provide descriptions for each model, I could include them in Integrated Genome Browser, a software tool my lab develops ( https://bioviz.org ). This software lets users quickly explore and analyze data from RNA-Seq, Chip-Seq and other genome-focused experiments. In IGB, if a user clicks on a gene model, they can see a text description of the gene model, if the annotation provider (yourself) has provided this. Users can even do keyword searches of the descriptions, as shown in the attached screen capture image from SL4.0. Including a short description of each gene model makes it much easier to understand and explore the results from RNA-Seq and other similar experiments. So, I hope you will be able to provide both a short text description for each gene model, along with the Gene Ontology annotations! Warm regards, Ann Loraine, Ph.D. Professor, Bioinformatics & Genomics Genome Visualization Lab College of Computing & Informatics University of North Carolina, Charlotte https://lorainelab.org https://bioviz.org Reply from Yao Zhou: Dear Ann, We now released the GO annotation of SL5.0 in our website ( http://solomics.agis.org.cn/tomato/ftp/GO/ ). Unfortunately, we do not have a text descriptions of the new gene models. Naama, would you please help us upload the GO annotation to SGN website? Thanks! Best regards, Yao
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Proceeding with importing the new gene models into IGB. Will create text descriptions of gene models in another ticket.

            Show
            ann.loraine Ann Loraine added a comment - - edited Proceeding with importing the new gene models into IGB. Will create text descriptions of gene models in another ticket.
            Hide
            ann.loraine Ann Loraine added a comment -

            Converting gff file with:

            gff3ToBedDetail.py -g SL5.0.gff3 -b S_lycopersicum_Jun_2022.bed
            

            Code source: https://bitbucket.org/lorainelab/genomesource/src/master/

            Show
            ann.loraine Ann Loraine added a comment - Converting gff file with: gff3ToBedDetail.py -g SL5.0.gff3 -b S_lycopersicum_Jun_2022.bed Code source: https://bitbucket.org/lorainelab/genomesource/src/master/
            Hide
            ann.loraine Ann Loraine added a comment -

            Added new files to SVN repository with message

            IGBF-3135 Add SL5.0 tomato genome

            Show
            ann.loraine Ann Loraine added a comment - Added new files to SVN repository with message IGBF-3135 Add SL5.0 tomato genome
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Need to deploy new data to:

            Discovered a problem with "data" directory on quickload.bioviz.org. Seems to have come unmounted. "data" directory is mounted volume vol-00695402d76f72259, listed in AWS console as attached to the EC2 host. Tried rebooting EC2. No change. Unmounted volume in console, then re-attached in console, rebooted. No change. Remounted with command "mount -t ext4 /dev/sdf /data". Able to then change into /data/quickload and run "svn up" as root user.

            Logging in to UNC Charlotte VPN. Then executing alias "quickload" (ssh -J aloraine@cci-moss -p 1657 aloraine@igbquickload.org) to reach quickload host.

            Show
            ann.loraine Ann Loraine added a comment - - edited Need to deploy new data to: http://igbquickload.org/quickload - hosted behind UNC Charlotte firewall - DONE http://lorainelab-quickload.scidas.org/quickload/ - hosted at RENCI - DONE https://quickload.bioviz.org - hosted at AWS - DONE Discovered a problem with "data" directory on quickload.bioviz.org. Seems to have come unmounted. "data" directory is mounted volume vol-00695402d76f72259, listed in AWS console as attached to the EC2 host. Tried rebooting EC2. No change. Unmounted volume in console, then re-attached in console, rebooted. No change. Remounted with command "mount -t ext4 /dev/sdf /data". Able to then change into /data/quickload and run "svn up" as root user. Logging in to UNC Charlotte VPN. Then executing alias "quickload" (ssh -J aloraine@cci-moss -p 1657 aloraine@igbquickload.org) to reach quickload host.
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            To test:

            • Open IGB
            • Click tomato image on start screen to load latest genome version
            • Observe SL5.0 is loaded - with gene models
            • Hover mouse over the "i" in Data Access Panel. Check that the message makes sense.
            • Click "i" in Data Access Panel. Check that the SL5.0 data directory on the Quickload host opens. Read the text. Check it makes sense.
            • Navigate one directory higher (see previous step.) Check that S_lycopersicum_Jun_2022 has a description in the files and folder listing.

            Also:

            • Check that IGB Genomes link on BioViz.org Web site shows the new genome version and that IGB can open this when you click the version shown.
            • Check that BioViz Connect also shows the new genome version as an option for annotating files.
            Show
            ann.loraine Ann Loraine added a comment - - edited To test: Open IGB Click tomato image on start screen to load latest genome version Observe SL5.0 is loaded - with gene models Hover mouse over the "i" in Data Access Panel. Check that the message makes sense. Click "i" in Data Access Panel. Check that the SL5.0 data directory on the Quickload host opens. Read the text. Check it makes sense. Navigate one directory higher (see previous step.) Check that S_lycopersicum_Jun_2022 has a description in the files and folder listing. Also: Check that IGB Genomes link on BioViz.org Web site shows the new genome version and that IGB can open this when you click the version shown. Check that BioViz Connect also shows the new genome version as an option for annotating files.
            Hide
            nfreese Nowlan Freese added a comment -
            • Clicked tomato image in IGB release 9.1.8
            • Tomato SL5.0 genome was loaded with ITAG5.0 gene models.
            • Gene models were able to be loaded for all chromosomes.
            • Genome version shows as S_lycopersicum_Jun_2022.
            • Chromosomes show as 0-12 (this seems a little odd as the previous genome versions used SL#.0ch##)
            • "i" icon reads as ITAG5.0 gene models, SL5.0 genome assembly (looks good).
            • Clicking "i" icon opens the Quickload web page. Text makes sense.
            • Parent directory shows "Solanum lycopersicum (Jun 2022) tomato (SL5.0)" for S_lycopersicum_Jun_2022/
            • IGB Genomes includes new tomato genome version, correct genome in IGB is opened when clicked.
            • BioViz Connect shows the new genome version as an option under metadata.

            No issues encountered. Closing ticket.

            Show
            nfreese Nowlan Freese added a comment - Clicked tomato image in IGB release 9.1.8 Tomato SL5.0 genome was loaded with ITAG5.0 gene models. Gene models were able to be loaded for all chromosomes. Genome version shows as S_lycopersicum_Jun_2022. Chromosomes show as 0-12 (this seems a little odd as the previous genome versions used SL#.0ch##) "i" icon reads as ITAG5.0 gene models, SL5.0 genome assembly (looks good). Clicking "i" icon opens the Quickload web page. Text makes sense. Parent directory shows "Solanum lycopersicum (Jun 2022) tomato (SL5.0)" for S_lycopersicum_Jun_2022/ IGB Genomes includes new tomato genome version, correct genome in IGB is opened when clicked. BioViz Connect shows the new genome version as an option under metadata. No issues encountered. Closing ticket.
            Hide
            ann.loraine Ann Loraine added a comment -

            Asked YZ for mapping between ITAG 4.0 and ITAG 5.0 names:

            Thank you very much Yao Zhou! I appreciate it! I downloaded the GO annotations just now and am starting to use them.

            I have a quick followup question which I hope will not be too much trouble!

            Do you have a spreadsheet that maps your new ITAG 5.0 names onto the older ITAG 4.0 (SL4.0 assembly) names?

            Until new and improved gene descriptions are available for ITAG 5.0, I was thinking that using the old gene descriptions, where possible, would be an OK temporary substitute. So having a quick and easy way to perform the mapping would be super helpful.

            Also, I wanted to let you know that I have released your wonderful new genome assembly and gene models in Integrated Genome Browser. Your new data looks great!

            If, at some point, you and/or your colleagues would like a quick tour of the browser, I would be more than happy to show you

            Warmest regards,

            Ann

            Show
            ann.loraine Ann Loraine added a comment - Asked YZ for mapping between ITAG 4.0 and ITAG 5.0 names: Thank you very much Yao Zhou! I appreciate it! I downloaded the GO annotations just now and am starting to use them. I have a quick followup question which I hope will not be too much trouble! Do you have a spreadsheet that maps your new ITAG 5.0 names onto the older ITAG 4.0 (SL4.0 assembly) names? Until new and improved gene descriptions are available for ITAG 5.0, I was thinking that using the old gene descriptions, where possible, would be an OK temporary substitute. So having a quick and easy way to perform the mapping would be super helpful. Also, I wanted to let you know that I have released your wonderful new genome assembly and gene models in Integrated Genome Browser. Your new data looks great! If, at some point, you and/or your colleagues would like a quick tour of the browser, I would be more than happy to show you Warmest regards, Ann
            Hide
            ann.loraine Ann Loraine added a comment -

            Answer from YZ:

            Dear Ann,
            Yes, we do have tables for the conversion between different versions. They are now avaiable on our website (http://solomics.agis.org.cn/tomato/ftp/ID_convert/).
            Thanks so much for the intergration! I would be happy to forward it to my colleagues when it is avaiable.
            Best regards,
            Yao

            Show
            ann.loraine Ann Loraine added a comment - Answer from YZ: Dear Ann, Yes, we do have tables for the conversion between different versions. They are now avaiable on our website ( http://solomics.agis.org.cn/tomato/ftp/ID_convert/ ). Thanks so much for the intergration! I would be happy to forward it to my colleagues when it is avaiable. Best regards, Yao

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: