Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Add the Dama dama (fallow deer) genome and annotation to IGB. Current Dama dama genome version provided by NCBI RefSeq: GCF_033118175.1.

      Dama dama (ASM3311817v1)(Nov 2023) - https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_033118175.1/

        Attachments

          Issue Links

            Activity

            pkulzer Paige Kulzer created issue -
            pkulzer Paige Kulzer made changes -
            Field Original Value New Value
            Epic Link IGBF-3823 [ 23122 ]
            pkulzer Paige Kulzer made changes -
            Sprint Winter 1 [ 209 ]
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            Below is an outline of the steps I followed to create the Dama dama Quickload:
            1. Convert genome .fna to .2bit

            ./faToTwoBit GCF_033118175.1_ASM3311817v1_genomic.fna D_dama_Nov_2023.2bit
            

            2. Create genome.txt and check to make sure it's ordered in a way that makes sense

            ./twoBitInfo D_dama_Nov_2023.2bit genome.txt
            cat genome.txt
            

            3. Convert annotations .gff to .bed

            ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g genomic.gff -b D_dama_Nov_2023_ncbiRefSeq.bed
            

            4. Sort, gzip, and tabix the .bed file

            sort -k1,1 -k2,2n D_dama_Nov_2023_ncbiRefSeq.bed | bgzip > D_dama_Nov_2023_ncbiRefSeq.bed.gz
            tabix -0 -s 1 -b 2 -e 3 D_dama_Nov_2023_ncbiRefSeq.bed.gz
            

            5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log.

            6. Make a new genome directory and create annots.xml

            cd ~/Documents/Repos/quickload
            
            mkdir D_dama_Nov_2023
            cd D_dama_Nov_2023
            
            cp ../C_teleta_Jan_2013/annots.xml .
            nano annots.xml
            

            7. Add Dama dama to contents.txt and .htaccess

            cd ..
            nano contents.txt
            nano .htacess
            

            8. Add Dama dama to species.txt and synonyms.txt

            nano species.txt
            nano synonyms.txt
            

            Link to a zipped copy of the quickload on Google Drive: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link

            Show
            pkulzer Paige Kulzer added a comment - Below is an outline of the steps I followed to create the Dama dama Quickload: 1. Convert genome .fna to .2bit ./faToTwoBit GCF_033118175.1_ASM3311817v1_genomic.fna D_dama_Nov_2023.2bit 2. Create genome.txt and check to make sure it's ordered in a way that makes sense ./twoBitInfo D_dama_Nov_2023.2bit genome.txt cat genome.txt 3. Convert annotations .gff to .bed ~/Documents/Repos/genomesource/gff3ToBedDetail.py -g genomic.gff -b D_dama_Nov_2023_ncbiRefSeq.bed 4. Sort, gzip, and tabix the .bed file sort -k1,1 -k2,2n D_dama_Nov_2023_ncbiRefSeq.bed | bgzip > D_dama_Nov_2023_ncbiRefSeq.bed.gz tabix -0 -s 1 -b 2 -e 3 D_dama_Nov_2023_ncbiRefSeq.bed.gz 5. Sanity check the .bed and .2bit files - Add the .2bit file as a reference, then drag/drop the .bed file into IGB. Confirm that gene models are present, labeled correctly, and the chromosomes listed are in a logical order. Also check that no error messages are present in the Log. 6. Make a new genome directory and create annots.xml cd ~/Documents/Repos/quickload mkdir D_dama_Nov_2023 cd D_dama_Nov_2023 cp ../C_teleta_Jan_2013/annots.xml . nano annots.xml 7. Add Dama dama to contents.txt and .htaccess cd .. nano contents.txt nano .htacess 8. Add Dama dama to species.txt and synonyms.txt nano species.txt nano synonyms.txt Link to a zipped copy of the quickload on Google Drive: https://drive.google.com/drive/folders/1bFRx4PqldxNf400n7Vr9SD_dNeNmtpvk?usp=drive_link
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
            pkulzer Paige Kulzer made changes -
            Link This issue relates to IGBF-4010 [ IGBF-4010 ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1 [ 209 ] Winter 1, Spring 1 [ 209, 210 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Testing on Mac
            Able to load sequence, annotations appear on chromosomes.

            A few small issues:

            • Check HEADER.md - there is a line that refers to H_vulgaris_Apr_2024.2bit
            • I'm seeing some additional text that should probably be removed or replaced in the bed file. For example, I would not expect to see the "gene-" or the "rna-" before the title/id/name. I'm also seeing the %2C in the description of some genes, which should be replaced with a comma.
              • title: gene-SORL1
              • id: rna-XM_061146202.1
              • name: rna-XM_061146202.1
            • There was no contents.txt, species.txt, or synonyms.txt in the zipped copy on Google Drive so I was not able to check that the species name was appearing as expected in the dropdown.
            Show
            nfreese Nowlan Freese added a comment - - edited Testing on Mac Able to load sequence, annotations appear on chromosomes. A few small issues: Check HEADER.md - there is a line that refers to H_vulgaris_Apr_2024.2bit I'm seeing some additional text that should probably be removed or replaced in the bed file. For example, I would not expect to see the "gene-" or the "rna-" before the title/id/name. I'm also seeing the %2C in the description of some genes, which should be replaced with a comma. title: gene-SORL1 id: rna-XM_061146202.1 name: rna-XM_061146202.1 There was no contents.txt, species.txt, or synonyms.txt in the zipped copy on Google Drive so I was not able to check that the species name was appearing as expected in the dropdown.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            Thanks for catching all of this Dr. Freese! I will update the documentation I've created as part of IGBF-4010 so that I make sure to check for issues like this in the future.

            I updated HEADER.md to remove any mention of the Hydra vulgaris Quickload. I also removed all erroneous instances of "gene-", "rna-", and "%2C" from the BED file. These updated files, along with contents.txt, .htaccess, species.txt, and synonyms.txt, have been zipped and re-uploaded as D_Dama.zip on Google Drive.

            Show
            pkulzer Paige Kulzer added a comment - Thanks for catching all of this Dr. Freese! I will update the documentation I've created as part of IGBF-4010 so that I make sure to check for issues like this in the future. I updated HEADER.md to remove any mention of the Hydra vulgaris Quickload. I also removed all erroneous instances of "gene-", "rna-", and "%2C" from the BED file. These updated files, along with contents.txt, .htaccess, species.txt, and synonyms.txt, have been zipped and re-uploaded as D_Dama.zip on Google Drive.
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited
            • rna- and gene- have been removed
            • HEADER.md looks good
            • species.txt and contents.txt look good

            Paige Kulzer - I vote we replace %2C with commas instead of removing them entirely. Without the comma it sounds like one long sentence for some of the descriptions.

            Show
            nfreese Nowlan Freese added a comment - - edited rna- and gene- have been removed HEADER.md looks good species.txt and contents.txt look good Paige Kulzer - I vote we replace %2C with commas instead of removing them entirely. Without the comma it sounds like one long sentence for some of the descriptions.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] To-Do [ 10305 ]
            pkulzer Paige Kulzer made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            Sure thing! I've marked that in the documentation for myself and have made that change in the BED file. The new zip file is up on Google Drive.

            Show
            pkulzer Paige Kulzer added a comment - Sure thing! I've marked that in the documentation for myself and have made that change in the BED file. The new zip file is up on Google Drive.
            pkulzer Paige Kulzer made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment -

            Looks good, ready for pull request.

            Show
            nfreese Nowlan Freese added a comment - Looks good, ready for pull request.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            Hide
            pkulzer Paige Kulzer added a comment -

            The Dama dama genome has been pushed to the SVN repo. Ready for final review!

            Show
            pkulzer Paige Kulzer added a comment - The Dama dama genome has been pushed to the SVN repo. Ready for final review!
            pkulzer Paige Kulzer made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            pkulzer Paige Kulzer made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            pkulzer Paige Kulzer made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            pkulzer Paige Kulzer made changes -
            Assignee Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1, Spring 1 [ 209, 210 ] Winter 1, Spring 1, Spring 2 [ 209, 210, 211 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Status Post-merge Testing In Progress [ 10003 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Sprint Winter 1, Spring 1, Spring 2 [ 209, 210, 211 ] Winter 1, Spring 1, Spring 2, Spring 3 [ 209, 210, 211, 212 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            nfreese Nowlan Freese added a comment -

            Tested using my local copy of the SVN repo.

            Gene models load correctly, able to load sequence, synonyms/species look good, Header.md looks good.

            Ready to be deployed.

            Show
            nfreese Nowlan Freese added a comment - Tested using my local copy of the SVN repo. Gene models load correctly, able to load sequence, synonyms/species look good, Header.md looks good. Ready to be deployed.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Revision 225 of the svn repository is now deployed. Ready for final testing.

            Request: Please check that the Web browser interface to the genome assembly directory on quickload makes sense and that all links work.

            Show
            ann.loraine Ann Loraine added a comment - Revision 225 of the svn repository is now deployed. Ready for final testing. Request: Please check that the Web browser interface to the genome assembly directory on quickload makes sense and that all links work.
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Tested on Mac with main branch installer.

            • Dama dama appears in the Species dropdown with the correct synonyms.
            • Gene models and the sequence loads (2bit located at UCSC).
            • Quickload page loads, links all work.

            Closing ticket

            Show
            nfreese Nowlan Freese added a comment - - edited Tested on Mac with main branch installer. Dama dama appears in the Species dropdown with the correct synonyms. Gene models and the sequence loads (2bit located at UCSC). Quickload page loads, links all work. Closing ticket
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                pkulzer Paige Kulzer
                Reporter:
                pkulzer Paige Kulzer
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: