Details
-
Type:
Task
-
Status: To-Do (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
Description
Situation: NCBI RefSeq assemblies use unique identifiers for each of their sequence records called RefSeq accession numbers. The format of a RefSeq accession number is: two-letter prefix, underscore (_), six or nine numbers, a dot (.), and a version number. We have a long-term goal to integrate NCBI genomes into IGB, several of which have already been added to the IGB Quickload repository, and I've noticed that these RefSeq accession numbers are what's being displayed in the Current Genome panel in IGB instead of the chromosome names/numbers.
Since this is a labeling system unique to NCBI, I suggest that we investigate a way to display chromosomes for NCBI genomes in IGB as chr1, chr2, etc., to maintain consistency across all IGB genomes.
Simply modifying the genomes.txt file in each of these Quickloads seems like the most straightforward approach. However, if a user aligns their data to an NCBI genome and then decides to view that aligned data in IGB, any edits we make to Quickload files might prevent the user from doing so.
Task: Investigate NCBI's API and determine what can be returned from the NCBI Genomes page.
NCBI provides a metadata table with each of its RefSeq assemblies that connects these RefSeq accession numbers with chromosome numbers (see the "Chromosomes" section of this page, for example: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_033118175.1/). Dig into the NCBI API and see if metadata from that table can be requested.