Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: 10.0.0 Major Release
-
Labels:None
-
Story Points:5
-
Sprint:Spring 7, Spring 9, Summer 1, Summer 2, Summer 3, Summer 4, Summer 5
Description
Bob Goldstein, faculty from UNC Chapel Hill, is coming to Charlotte to give a seminar on Friday April 5 at 2:30.
His research Web site: https://goldsteinlab.weebly.com/
His tardigrades site: http://tardigrades.bio.unc.edu/
Ann Loraine signed up for a meeting with him at 1:30 in the third floor conference room. (We might use my office instead, depending on whether they finished repairing the walls yet - long story!)
Our goal is to talk with him about the tardigrade genome project and how we can represent tardigrade (water bears) in IGB.
Water bears are an emerging model system for studying how animals survive in extreme environments.
According to the UCSC Genome Browser, they now have three tardigrade genome assemblies available.
Bob Goldstein's tardigrades web site (http://tardigrades.bio.unc.edu/) mentions that they are "developing the water bear Hypsibius exemplaris as a new model for studying evo-devo and resistance to extremes."
Their tardigrade site has a "links" page that includes a blast search link hosted at NCBI. That page allows users to search genome assembly
"Hypsibius dujardini GenBank assembly GCA_002082055.1"
UCSC appears to also feature this same genome assembly on their site. They call it "nHd_3.1 Apr. 2017 H.dujardini (Z151 2017 tardigrades) (GCA_002082055.1)"
See:
For this task:
- Get reference 2bit genome sequence file for the above genome assembly
- Investigate and get reference gene model annotations for the above assembly
- Add reference genome assembly to IGB Quickload main
- Add reference gene model annotations to IGB Quickload main
- Review Goldstein lab publications to identify foundational RNA-Seq data for this organism
Let's aim to get the tardigrade data added to IGB QL before Friday so that we can:
- show Dr. Goldstein an IGB demo with the tardigrade genome data
- ask Dr. Goldstein which RNA-Seq data sets are most relevant to his research
- tentatively schedule an on-line demo of IGB with Dr. Goldstein lab members
For the R. varieornatus tardigrade:
Paper: https://www.nature.com/articles/ncomms12808
Website: http://kumamushi.org/database.html
NCBI: https://www.ncbi.nlm.nih.gov/nuccore/BDGG00000000.1
Note: UCSC does have the R. varieornatus genome in the browser, but I am not seeing it in the Table Browser, UCSC API, or UCSC Genome Downloads.
UCSC Gene Models: https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=2077278096_b5zQo29khGhNnPNEOOamtx0cKoXg&db=hub_2790993_GCA_001949185.1&c=BDGG01000001.1&g=hub_2790993_ncbiGene
The NCBI FTP site referenced as being used by UCSC: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/949/185/GCA_001949185.1_Rvar_4.0/
I pulled the following files:
GCA_001949185.1_Rvar_4.0_genomic.gff.gz
GCA_001949185.1_Rvar_4.0_genomic.fna.gz
I was able to load the files in IGB and they look correct.
NCBI also has this page: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_001949185.1/
The Genome sequences (FASTA) and Annotation features (GFF) files are the same as those from the ftp.