Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Add the Dama dama (fallow deer) genome and annotation to IGB. Current Dama dama genome version provided by NCBI RefSeq: GCF_033118175.1.

      Dama dama (ASM3311817v1)(Nov 2023) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_033118175.1/

        Attachments

          Issue Links

            Activity

            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            Below is an outline of the steps I followed to create the Dama dama Quickload:
            1. Convert genome .fna to .2bit

            ./faToTwoBit GCF_033118175.1_ASM3311817v1_genomic.fna D_dama_Nov_2023.2bit
            

            2. Create genome.txt and check to make sure it's ordered in a way that makes sense

            ./twoBitInfo D_dama_Nov_2023.2bit genome.txt
            cat genome.txt
            

            3. Convert annotations .gff to .bed

            ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g genomic.gff -b D_dama_Nov_2023_ncbiRefSeq.bed
            

            4. Sort, gzip, and tabix the .bed file

            sort -k1,1 -k2,2n D_dama_Nov_2023_ncbiRefSeq.bed | bgzip > D_dama_Nov_2023_ncbiRefSeq.bed.gz
            tabix -0 -s 1 -b 2 -e 3 D_dama_Nov_2023_ncbiRefSeq.bed.gz
            

            5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

            6. Make a new genome directory and create annots.xml

            cd ~/Documents/Repos/quickload
            
            mkdir D_dama_Nov_2023
            cd D_dama_Nov_2023
            
            cp ../C_teleta_Jan_2013/annots.xml .
            nano annots.xml
            

            7. Add Dama dama to contents.txt and .htaccess

            cd ..
            nano contents.txt
            nano .htacess
            

            8. Add Dama dama to species.txt and synonyms.txt

            nano species.txt
            nano synonyms.txt
            

            Link to a zipped copy of the quickload on Google Drive: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - Below is an outline of the steps I followed to create the Dama dama Quickload: 1. Convert genome .fna to .2bit ./faToTwoBit GCF_033118175.1_ASM3311817v1_genomic.fna D_dama_Nov_2023.2bit 2. Create genome.txt and check to make sure it's ordered in a way that makes sense ./twoBitInfo D_dama_Nov_2023.2bit genome.txt cat genome.txt 3. Convert annotations .gff to .bed ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g genomic.gff -b D_dama_Nov_2023_ncbiRefSeq.bed 4. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n D_dama_Nov_2023_ncbiRefSeq.bed | bgzip > D_dama_Nov_2023_ncbiRefSeq.bed.gz tabix -0 -s 1 -b 2 -e 3 D_dama_Nov_2023_ncbiRefSeq.bed.gz 5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 6. Make a new genome directory and create annots.xml cd ~/Documents/Repos/quickload mkdir D_dama_Nov_2023 cd D_dama_Nov_2023 cp ../C_teleta_Jan_2013/annots.xml . nano annots.xml 7. Add Dama dama to contents.txt and .htaccess cd .. nano contents.txt nano .htacess 8. Add Dama dama to species.txt and synonyms.txt nano species.txt nano synonyms.txt Link to a zipped copy of the quickload on Google Drive: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Testing on Mac
            Able to load sequence, annotations appear on chromosomes.

            A few small issues:

            • Check HEADER.md - there is a line that refers to H_vulgaris_Apr_2024.2bit
            • I'm seeing some additional text that should probably be removed or replaced in the bed file. For example, I would not expect to see the "gene-" or the "rna-" before the title/id/name. I'm also seeing the %2C in the description of some genes, which should be replaced with a comma.
              • title: gene-SORL1
              • id: rna-XM_061146202.1
              • name: rna-XM_061146202.1
            • There was no contents.txt, species.txt, or synonyms.txt in the zipped copy on Google Drive so I was not able to check that the species name was appearing as expected in the dropdown.
            Show
            nfreese Nowlan Freese added a comment - - edited Testing on Mac Able to load sequence, annotations appear on chromosomes. A few small issues: Check HEADER.md - there is a line that refers to H_vulgaris_Apr_2024.2bit I'm seeing some additional text that should probably be removed or replaced in the bed file. For example, I would not expect to see the "gene-" or the "rna-" before the title/id/name. I'm also seeing the %2C in the description of some genes, which should be replaced with a comma. title: gene-SORL1 id: rna-XM_061146202.1 name: rna-XM_061146202.1 There was no contents.txt, species.txt, or synonyms.txt in the zipped copy on Google Drive so I was not able to check that the species name was appearing as expected in the dropdown.
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            Thanks for catching all of this Dr. Freese! I will update the documentation I've created as part of IGBF-4010 so that I make sure to check for issues like this in the future.

            I updated HEADER.md to remove any mention of the Hydra vulgaris Quickload. I also removed all erroneous instances of "gene-", "rna-", and "%2C" from the BED file. These updated files, along with contents.txt, .htaccess, species.txt, and synonyms.txt, have been zipped and re-uploaded as D_Dama.zip on Google Drive.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - Thanks for catching all of this Dr. Freese! I will update the documentation I've created as part of IGBF-4010 so that I make sure to check for issues like this in the future. I updated HEADER.md to remove any mention of the Hydra vulgaris Quickload. I also removed all erroneous instances of "gene-", "rna-", and "%2C" from the BED file. These updated files, along with contents.txt, .htaccess, species.txt, and synonyms.txt, have been zipped and re-uploaded as D_Dama.zip on Google Drive.
            Hide
            nfreese Nowlan Freese added a comment - - edited
            • rna- and gene- have been removed
            • HEADER.md looks good
            • species.txt and contents.txt look good

            Paige Kulzer - I vote we replace %2C with commas instead of removing them entirely. Without the comma it sounds like one long sentence for some of the descriptions.

            Show
            nfreese Nowlan Freese added a comment - - edited rna- and gene- have been removed HEADER.md looks good species.txt and contents.txt look good Paige Kulzer - I vote we replace %2C with commas instead of removing them entirely. Without the comma it sounds like one long sentence for some of the descriptions.
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            Sure thing! I've marked that in the documentation for myself and have made that change in the BED file. The new zip file is up on Google Drive.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - Sure thing! I've marked that in the documentation for myself and have made that change in the BED file. The new zip file is up on Google Drive.
            Hide
            nfreese Nowlan Freese added a comment -

            Looks good, ready for pull request.

            Show
            nfreese Nowlan Freese added a comment - Looks good, ready for pull request.
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            The Dama dama genome has been pushed to the SVN repo. Ready for final review!

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - The Dama dama genome has been pushed to the SVN repo. Ready for final review!
            Hide
            nfreese Nowlan Freese added a comment -

            Tested using my local copy of the SVN repo.

            Gene models load correctly, able to load sequence, synonyms/species look good, Header.md looks good.

            Ready to be deployed.

            Show
            nfreese Nowlan Freese added a comment - Tested using my local copy of the SVN repo. Gene models load correctly, able to load sequence, synonyms/species look good, Header.md looks good. Ready to be deployed.
            Hide
            ann.loraine Ann Loraine added a comment -

            Revision 225 of the svn repository is now deployed. Ready for final testing.

            Request: Please check that the Web browser interface to the genome assembly directory on quickload makes sense and that all links work.

            Show
            ann.loraine Ann Loraine added a comment - Revision 225 of the svn repository is now deployed. Ready for final testing. Request: Please check that the Web browser interface to the genome assembly directory on quickload makes sense and that all links work.
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Tested on Mac with main branch installer.

            • Dama dama appears in the Species dropdown with the correct synonyms.
            • Gene models and the sequence loads (2bit located at UCSC).
            • Quickload page loads, links all work.

            Closing ticket

            Show
            nfreese Nowlan Freese added a comment - - edited Tested on Mac with main branch installer. Dama dama appears in the Species dropdown with the correct synonyms. Gene models and the sequence loads (2bit located at UCSC). Quickload page loads, links all work. Closing ticket

              People

              • Assignee:
                pkulzer Paige Kulzer (Inactive)
                Reporter:
                pkulzer Paige Kulzer (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: