Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4130

Remove older 2bit files and point at UCSC hosted 2bit files in annots.xml

    Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: We are hosting our subversion repository on our own EC2. The larger size of the repository is expensive.

      Task: Remove the 2bit file from the svn repository for older "legacy" genome versions, such as older versions of the human or mouse genomes. Then use the annots.xml "name" attribute to point to that same file hosted on the UCSC Genome Web (link: https://hgdownload.soe.ucsc.edu/gbdb/) and set "reference" to true (see https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml for more details).

        Attachments

          Issue Links

            Activity

            nfreese Nowlan Freese created issue -
            nfreese Nowlan Freese made changes -
            Field Original Value New Value
            Epic Link IGBF-2323 [ 18477 ]
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-4030 [ IGBF-4030 ]
            nfreese Nowlan Freese made changes -
            Link This issue relates to IGBF-2537 [ IGBF-2537 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Attachment 2bit_sizes.txt [ 18646 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment - - edited

            See attached for a document (2bit_sizes.txt) created by Dr. Loraine which provides some more information about this task. It also contains a running list of genomes which have already been worked at the time of this comment.

            My task will be to modify the annots.xml file and remove the 2bit file for each genome I've added from NCBI whose 2bit file is 100 MB or larger.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - - edited See attached for a document (2bit_sizes.txt) created by Dr. Loraine which provides some more information about this task. It also contains a running list of genomes which have already been worked at the time of this comment. My task will be to modify the annots.xml file and remove the 2bit file for each genome I've added from NCBI whose 2bit file is 100 MB or larger.
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Paige Kulzer [ pkulzer ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Description Situation: We are hosting our [subversion repository|https://svn.bioviz.org/viewvc/genomes/quickload/] on our own EC2. The larger size of the repository is expensive.

            Task: Remove the 2bit file from the svn repository for older "legacy" genome versions, such as older versions of the human or mouse genomes. Then use the annots.xml "name" attribute to point to that same file hosted on the UCSC Genome Web (link: [https://hgdownload.soe.ucsc.edu/gbdb/|https://hgdownload.soe.ucsc.edu/gbdb/]) and set "index" to true (see [https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml|https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml] for more details).
            Situation: We are hosting our [subversion repository|https://svn.bioviz.org/viewvc/genomes/quickload/] on our own EC2. The larger size of the repository is expensive.

            Task: Remove the 2bit file from the svn repository for older "legacy" genome versions, such as older versions of the human or mouse genomes. Then use the annots.xml "name" attribute to point to that same file hosted on the UCSC Genome Web (link: [https://hgdownload.soe.ucsc.edu/gbdb/|https://hgdownload.soe.ucsc.edu/gbdb/]) and set "reference" to true (see [https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml|https://wiki.bioviz.org/confluence/display/igbman/About+annots.xml] for more details).
            pkulzer Paige Kulzer (Inactive) made changes -
            Sprint Spring 4 [ 213 ] Spring 3 [ 212 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            To test that I'm modifying annots.xml files correctly, I modified annots.xml for just one genome (Aedes albopictus), deleted the corresponding 2bit file, then pushed those changes to the svn repo (revision 224). I then disabled IGB Quickload as a data provider in IGB and added my local copy of the svn repo as a data provider, opened the A. albopictus genome, zoomed in, and clicked Load Sequence. The reference sequence loaded successfully which means that I've correctly modified annots.xml!

            I'll now modify the remaining annots.xml files on my to-do list, remove those corresponding 2bit files from the repo, and commit those revisions all at once.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - To test that I'm modifying annots.xml files correctly, I modified annots.xml for just one genome ( Aedes albopictus ), deleted the corresponding 2bit file, then pushed those changes to the svn repo (revision 224). I then disabled IGB Quickload as a data provider in IGB and added my local copy of the svn repo as a data provider, opened the A. albopictus genome, zoomed in, and clicked Load Sequence. The reference sequence loaded successfully which means that I've correctly modified annots.xml! I'll now modify the remaining annots.xml files on my to-do list, remove those corresponding 2bit files from the repo, and commit those revisions all at once.
            pkulzer Paige Kulzer (Inactive) made changes -
            Attachment 2bit_sizes_3_7_2025.txt [ 18649 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            I've finished removing the 2bit files I added from NCBI. Our working list has been updated (see attached; 2bit_sizes_3_7_2025.txt).

            FYI, it looks like the remaining genomes with 2bit files not hosted in bigzips are hosted in GenArk (https://hgdownload.soe.ucsc.edu/hubs/).

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - I've finished removing the 2bit files I added from NCBI. Our working list has been updated (see attached; 2bit_sizes_3_7_2025.txt). FYI, it looks like the remaining genomes with 2bit files not hosted in bigzips are hosted in GenArk ( https://hgdownload.soe.ucsc.edu/hubs/ ).
            pkulzer Paige Kulzer (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Status of the svn repository:

            • As of Paige Kulzer's changes, the repository version is 225
            • As of today, a "fresh" checkout of the repository version 225 creates a local directory of size 20 Gb

            Checked out to my local using:

            svn co -r 225 https://svn.bioviz.org/repos/genomes/quickload quickload.r225
            
            • An svn "dump" file of r225 is 10 Gb

            made on the svn host svn.bioviz.org using:

            svnadmin dump -r 225:HEAD /svn/genomes > genomes.r225.dump
            
            • An svn "dump" file of the entire repository is 32 Gb

            made on the svn host svn.bioviz.org using:

            svnadmin dump /svn/genomes > genomes.all.dump
            
            Show
            ann.loraine Ann Loraine added a comment - Status of the svn repository: As of Paige Kulzer 's changes, the repository version is 225 As of today, a "fresh" checkout of the repository version 225 creates a local directory of size 20 Gb Checked out to my local using: svn co -r 225 https: //svn.bioviz.org/repos/genomes/quickload quickload.r225 An svn "dump" file of r225 is 10 Gb made on the svn host svn.bioviz.org using: svnadmin dump -r 225:HEAD /svn/genomes > genomes.r225.dump An svn "dump" file of the entire repository is 32 Gb made on the svn host svn.bioviz.org using: svnadmin dump /svn/genomes > genomes.all.dump
            ann.loraine Ann Loraine made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            ann.loraine Ann Loraine made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            To test:

            • De-activate all data providers except for Quickload "main"
            • Visit each of the genome versions listed in attached file with an "x" at the front of the line (sorry, there are a lot!)
            • Check choose a location in the genome and zoom in
            • Click "Load sequence" and check for errors in the log
            Show
            ann.loraine Ann Loraine added a comment - - edited To test: De-activate all data providers except for Quickload "main" Visit each of the genome versions listed in attached file with an "x" at the front of the line (sorry, there are a lot!) Check choose a location in the genome and zoom in Click "Load sequence" and check for errors in the log
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Pranav Bhatia [ pbhatia1 ]
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3 [ 212 ] Spring 3, Spring 4 [ 212, 213 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            I have a suggestion for the testing task!

            What do you think about automating this testing task?

            I can think of a few different ways it could be done, some more complicated than others

            One way kind of interesting and fun way might be to write a simple "IGB App" (we could call it "2bit Checker") that a tester could install into IGB. It would then try to get a listing of all genomes offered within IGB where we are using the "file" tag to point to an externally hosted 2bit file. (I'm not sure if the IGB API perfectly supports this, but that's OK. We could use XML parsing code within the App instead.) Then, once the App has the list of files to check, it would then try to retrieve a bit of sequence from each of the referenced files. It would then report the outcome of each attempt.

            One possible value of doing this would be that we would have yet another example of an IGB App and we would perhaps quickly find out if any of our 2bit files are corrupted or unavailable.

            We could create a simple IGB App called that would attempt to retrieve a bit of genome sequence for every genome version within a

            Show
            ann.loraine Ann Loraine added a comment - - edited I have a suggestion for the testing task! What do you think about automating this testing task? I can think of a few different ways it could be done, some more complicated than others One way kind of interesting and fun way might be to write a simple "IGB App" (we could call it "2bit Checker") that a tester could install into IGB. It would then try to get a listing of all genomes offered within IGB where we are using the "file" tag to point to an externally hosted 2bit file. (I'm not sure if the IGB API perfectly supports this, but that's OK. We could use XML parsing code within the App instead.) Then, once the App has the list of files to check, it would then try to retrieve a bit of sequence from each of the referenced files. It would then report the outcome of each attempt. One possible value of doing this would be that we would have yet another example of an IGB App and we would perhaps quickly find out if any of our 2bit files are corrupted or unavailable. We could create a simple IGB App called that would attempt to retrieve a bit of genome sequence for every genome version within a
            nfreese Nowlan Freese made changes -
            Assignee Pranav Bhatia [ pbhatia1 ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            Tested following instructions above on Mac on IGB 10.1.0 release.

            All genome versions marked with "x" were able to load the sequence except for the two below:

            • x 617M ./E_caballus_Sep_2007/E_caballus_Sep_2007.2bit
            • x 577M ./A_melanoleuca_Dec_2009/A_melanoleuca_Dec_2009.2bit

            My guess is that the URL to the 2bit file is incorrect for both of these genomes.

            Show
            nfreese Nowlan Freese added a comment - - edited Tested following instructions above on Mac on IGB 10.1.0 release. All genome versions marked with "x" were able to load the sequence except for the two below: x 617M ./E_caballus_Sep_2007/E_caballus_Sep_2007.2bit x 577M ./A_melanoleuca_Dec_2009/A_melanoleuca_Dec_2009.2bit My guess is that the URL to the 2bit file is incorrect for both of these genomes. E_caballus annots: http://igbquickload-main.bioviz.org/quickload/E_caballus_Sep_2007/annots.xml A_melanoleuca annots: http://igbquickload-main.bioviz.org/quickload/A_melanoleuca_Dec_2009/annots.xml
            nfreese Nowlan Freese made changes -
            Status Post-merge Testing In Progress [ 10003 ] Merged Needs Testing [ 10002 ]
            nfreese Nowlan Freese made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            nfreese Nowlan Freese made changes -
            Status Post-merge Testing In Progress [ 10003 ] To-Do [ 10305 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            Attn: Ann Loraine, the SVN server is down again which is blocking work on this ticket.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - Attn: Ann Loraine , the SVN server is down again which is blocking work on this ticket.
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Paige Kulzer [ pkulzer ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Sorry for the block. The server should now be available.

            Attn: Paige Kulzer

            Show
            ann.loraine Ann Loraine added a comment - Sorry for the block. The server should now be available. Attn: Paige Kulzer
            ann.loraine Ann Loraine made changes -
            Sprint Spring 3, Spring 4 [ 212, 213 ] Spring 3, Spring 4, Spring 5 [ 212, 213, 214 ]
            ann.loraine Ann Loraine made changes -
            Rank Ranked higher
            pkulzer Paige Kulzer (Inactive) made changes -
            Status To-Do [ 10305 ] In Progress [ 3 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            The URL's to the 2bit files were, in fact, incorrect for both of these genomes. I've fixed both URL's and tested these changes with IGB 10.1.0 and my local quickload repository. Both reference genomes/sequences load successfully, no errors in the log. Ready for testing!

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - The URL's to the 2bit files were, in fact, incorrect for both of these genomes. I've fixed both URL's and tested these changes with IGB 10.1.0 and my local quickload repository. Both reference genomes/sequences load successfully, no errors in the log. Ready for testing!
            pkulzer Paige Kulzer (Inactive) made changes -
            Status In Progress [ 3 ] Needs 1st Level Review [ 10005 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Paige Kulzer [ pkulzer ] Nowlan Freese [ nfreese ]
            nfreese Nowlan Freese made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            Hide
            nfreese Nowlan Freese added a comment - - edited

            SVN is down so I just tested the new URLs as custom genomes.

            • E_caballus_Sep_2007 - working correctly
            • A_melanoleuca_Dec_2009 - kind of working, Paige Kulzer - the url looks good, but when I try to view the genome in IGB it pauses for a while then IGB starts to load the sequence for every single chromosome, which it shouldn't do. Not sure why this is happening, but it may not be an issue if the genome is accessed through a Quickload. Could you try it on your machine and if it looks good then lets deploy the changes.
            Show
            nfreese Nowlan Freese added a comment - - edited SVN is down so I just tested the new URLs as custom genomes. E_caballus_Sep_2007 - working correctly A_melanoleuca_Dec_2009 - kind of working, Paige Kulzer - the url looks good, but when I try to view the genome in IGB it pauses for a while then IGB starts to load the sequence for every single chromosome, which it shouldn't do. Not sure why this is happening, but it may not be an issue if the genome is accessed through a Quickload. Could you try it on your machine and if it looks good then lets deploy the changes.
            nfreese Nowlan Freese made changes -
            Assignee Nowlan Freese [ nfreese ] Paige Kulzer [ pkulzer ]
            nfreese Nowlan Freese made changes -
            Status First Level Review in Progress [ 10301 ] Needs 1st Level Review [ 10005 ]
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            It took much longer to load the A_melanoleuca genome than it did the E_caballus genome (1.600 min vs 151.2 ms), but I'm not seeing IGB attempt to load sequences. I don't see anything out of the ordinary with the annots.xml file that might be causing this.

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - It took much longer to load the A_melanoleuca genome than it did the E_caballus genome (1.600 min vs 151.2 ms), but I'm not seeing IGB attempt to load sequences. I don't see anything out of the ordinary with the annots.xml file that might be causing this.
            Hide
            pkulzer Paige Kulzer (Inactive) added a comment -

            Reviewed with Dr. Freese - ready for deployment!

            Show
            pkulzer Paige Kulzer (Inactive) added a comment - Reviewed with Dr. Freese - ready for deployment!
            pkulzer Paige Kulzer (Inactive) made changes -
            Status Needs 1st Level Review [ 10005 ] First Level Review in Progress [ 10301 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Status First Level Review in Progress [ 10301 ] Ready for Pull Request [ 10304 ]
            pkulzer Paige Kulzer (Inactive) made changes -
            Assignee Paige Kulzer [ pkulzer ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Thanks Paige Kulzer and Nowlan Freese!

            I have deployed the new version of the repository and also made full and "thin" backups to this latest version, which is revision 229.

            Backups are available on-line here:

            https://data.bioviz.org/svnbackup/

            Show
            ann.loraine Ann Loraine added a comment - - edited Thanks Paige Kulzer and Nowlan Freese ! I have deployed the new version of the repository and also made full and "thin" backups to this latest version, which is revision 229. Backups are available on-line here: https://data.bioviz.org/svnbackup/
            ann.loraine Ann Loraine made changes -
            Status Ready for Pull Request [ 10304 ] Pull Request Submitted [ 10101 ]
            ann.loraine Ann Loraine made changes -
            Status Pull Request Submitted [ 10101 ] Reviewing Pull Request [ 10303 ]
            ann.loraine Ann Loraine made changes -
            Status Reviewing Pull Request [ 10303 ] Merged Needs Testing [ 10002 ]
            ann.loraine Ann Loraine made changes -
            Status Merged Needs Testing [ 10002 ] Post-merge Testing In Progress [ 10003 ]
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status Post-merge Testing In Progress [ 10003 ] Closed [ 6 ]

              People

              • Assignee:
                ann.loraine Ann Loraine
                Reporter:
                nfreese Nowlan Freese
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: