Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-3145

Transfer SL4.0 gene descriptions to SL5.0 annotations

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      0.25
    • Sprint:
      Summer 5 2022 July 18, Summer 6 2022 Aug 1, Fall 1 2022 Aug 15, Fall 2 2022 Sep 5, Fall 3 2022 Sep 26, Fall 4 2022 Oct 10, Fall 5 2022 Oct 24, Fall 6 2022 Nov 7

      Description

      Git repository for this sub-project: https://bitbucket.org/hotpollen/splicing-analysis/src/master/

      Use gene mapping table to add gene descriptions to new annotations bed-detail file for SL5.0 genome assembly release and annotations.

      Specifically:

      The 13th column of this file contains a gene identifier. The 4th column contains a transcript identifier. The final (14th) column contains "NA" for "Not Available." We would like it to instead contain a description of the gene. Since we don't have that, we would like to insert the description of gene counterparts present in the SL4.0 annotations.

      To map genes from SL4 onto SL5, you need a mapping file. The people who made SL5.0 gave us that mapping file back in July 2022:

      Dear Ann,
      Yes, we do have tables for the conversion between different versions. They are now avaiable on our website (http://solomics.agis.org.cn/tomato/ftp/ID_convert/).
      Thanks so much for the intergration! I would be happy to forward it to my colleagues when it is avaiable.
      Best regards,
      Yao

      Mapping files needed are also available in: /nobackup/tomato_genome/alt_splicing/mappingfiles

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Update:

            • Aligned SL4 gene models onto SL5 genome assembly using blat.sh:
            #!/bin/bash
            G=S_lycopersicum_Jun_2022
            D=$G.2bit
            Q=S_lycopersicum_Sep_2019_models_cDNA.fa
            PSL=SL42SL5.psl
            MI=15000
            blat -noTrimA -maxIntron=$MI -noHead -minIdentity=95 -dots=100 $D $Q $PSL
            
            • Sorted and tabix-indexed PSL output file with:
            sort -k14,14 -k16,16n SL42SL5.psl | bgzip -c > SL42SL5.psl.gz
            tabix -s 14 -b 16 -e 17 SL42SL5.psl.gz
            
            • Added output and code to repository - bitbucket.org/hotpollen/splicing-analysis.git
            Show
            ann.loraine Ann Loraine added a comment - - edited Update: Aligned SL4 gene models onto SL5 genome assembly using blat.sh: #!/bin/bash G=S_lycopersicum_Jun_2022 D=$G.2bit Q=S_lycopersicum_Sep_2019_models_cDNA.fa PSL=SL42SL5.psl MI=15000 blat -noTrimA -maxIntron=$MI -noHead -minIdentity=95 -dots=100 $D $Q $PSL Sorted and tabix-indexed PSL output file with: sort -k14,14 -k16,16n SL42SL5.psl | bgzip -c > SL42SL5.psl.gz tabix -s 14 -b 16 -e 17 SL42SL5.psl.gz Added output and code to repository - bitbucket.org/hotpollen/splicing-analysis.git
            Hide
            ann.loraine Ann Loraine added a comment -

            Reference: https://bitbucket.org/lorainelab/affyprobesetsforigb/src/master/ (documents how to make a tabix-indexed file from PSL blat output)

            Show
            ann.loraine Ann Loraine added a comment - Reference: https://bitbucket.org/lorainelab/affyprobesetsforigb/src/master/ (documents how to make a tabix-indexed file from PSL blat output)
            Hide
            ann.loraine Ann Loraine added a comment -

            Update:

            Modifying SL5 description field to include SL4 locus identifier, for example:

            hexokinase-1 protein (AHRD V3.3 *** AT1G05205.1)

            becomes:

            hexokinase-1 protein (AHRD V3.3 *** AT1G05205.1) ITAG4.0:Solyc07g052420.3

            Show
            ann.loraine Ann Loraine added a comment - Update: Modifying SL5 description field to include SL4 locus identifier, for example: hexokinase-1 protein (AHRD V3.3 *** AT1G05205.1) becomes: hexokinase-1 protein (AHRD V3.3 *** AT1G05205.1) ITAG4.0:Solyc07g052420.3
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Added new file to svn repository after creating tabix-indexed file with:

            sort -k1,1 -k2,2n ~/src/splicing-analysis2/DescriptionMapping/output/S_lycopersicum_Jun_2022.bed | bgzip > S_lycopersicum_Jun_2022.bed.gz 
            tabix -s 1 -b 2 -e 3 S_lycopersicum_Jun_2022.bed.gz
            

            svn repo info:

            Show
            ann.loraine Ann Loraine added a comment - - edited Added new file to svn repository after creating tabix-indexed file with: sort -k1,1 -k2,2n ~/src/splicing-analysis2/DescriptionMapping/output/S_lycopersicum_Jun_2022.bed | bgzip > S_lycopersicum_Jun_2022.bed.gz tabix -s 1 -b 2 -e 3 S_lycopersicum_Jun_2022.bed.gz svn repo info: Browse by visiting https://svn.bioviz.org/viewvc/ svn repo URLs: URL: https://svn.bioviz.org/repos/genomes/quickload/S_lycopersicum_Jun_2022 Repository Root: https://svn.bioviz.org/repos/genomes to check out the repo using read-only user, enter user name "guest" password "guest"
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Updated quickload sites on RENCI and UNCC hosting.

            At UNCC, logged in with:

            ssh -J aloraine@cci-jump.uncc.edu -p 1657 aloraine@igbquickload.org
            

            At RENCI, logged in with:

            ssh -J aloraine@hop.renci.org aloraine@lorainelab-quickload.scidas.org
            
            Show
            ann.loraine Ann Loraine added a comment - - edited Updated quickload sites on RENCI and UNCC hosting. At UNCC, logged in with: ssh -J aloraine@cci-jump.uncc.edu -p 1657 aloraine@igbquickload.org At RENCI, logged in with: ssh -J aloraine@hop.renci.org aloraine@lorainelab-quickload.scidas.org

              People

              • Assignee:
                Mdavis4290 Molly Davis
                Reporter:
                ann.loraine Ann Loraine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: