Details
-
Type: Task
-
Status: Needs 1st Level Review (View Workflow)
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Fall 3, Fall 4
Description
Situation: There is a new version of the human genome referred to as either telomere to telomere or T2T or HS1. UCSC does provide this genome, link, and IGB is pulling in the genome through the UCSC REST API. As part of IGBF-3902, IGB is now including the hs1 genome under the "Homo sapiens" Species dropdown, but a Quickload for this genome still needs to be made.
Tasks: Create a bed14 annotation file for this new genome, and add it and the 2bit file (link) to IGB Quickload.
As per discussion with Nowlan Freese after scrum, we would like to do these two things:
This is to allow this assembly to exist alongside the traditional hg38, hg19, etc assemblies. We want to do this because so many people are 100% using hg38 and hg19, not this new assembly just yet.
The version should be something like: H_sapiens_T2T_MMM_YYYY
Note that we need to be super duper careful about making sure that our version dates map correctly onto UCSC patch releases, or whatever they are doing to keep track of how the sequence itself (and all the constituent contigs) changes over time.
This will enable IGB to locally cache the genome file instead of always having to use the JSON REST API to retrieve sequence data all the time. Also, retrieving data from a 2bit file may be faster than getting sequence data from the JSON REST API.
Testing: Make sure that IGB can also retrieve sequence data from the JSON REST API in case the URL of the 2bit file changes or the UCSC Web site messes up somehow.
To do this, a tester can edit the "file" tag to point to a bogus location. If everything works the way it is supposed to work, then the user won't even notice that the 2bit file is missing and will simply default to getting data from the UCSC JSON API.
However, note that the "load priority" numbers are related to which data source IGB retrieves sequence data from.