[IGBF-3330] Add mm39 (GRCm39) mouse genome to IGB - JIRA UNCC

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Improve IGB for users
Sprint:
Summer 2 2023 May 29, Summer 3 2023 June 12, Summer 4 2023 June 26

Description

Situation: A user has asked if we have the GRCm39 (mm39) mouse genome in IGB. We do not support it through Quickloads, but it is available through the UCSC DAS data source.

Task: Add the GRCm39 (mm39) genome and annotation to IGB.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

IGB_Mus-musculus.pdf
385 kB
26/Jun/23 2:48 PM

Issue Links

relates to

IGBF-1617 Update human genome annotation

Closed

IGBF-3362 Add rn7 rat genome to IGB

Closed

IGBF-3361 Add galGal5 and galGal6 chicken genome to IGB

To-Do

IGBF-3363 Investigate psl off by one error

To-Do

IGBF-753 Add new zebrafish genome

Closed

IGBF-2952 Add UCSC dm6 fruit fly genome and annotation to IGB

Closed

Show 1 more links (1 relates to)

Activity

Ascending order - Click to sort in descending order

Nowlan Freese created issue - 02/May/23 8:49 AM

Nowlan Freese made changes - 02/May/23 8:49 AM

Field	Original Value	New Value
Epic Link		IGBF-1765 [ 17855 ]

Nowlan Freese made changes - 02/May/23 8:52 AM

Epic Link

IGBF-1765 [ 17855 ]

IGBF-1478 [ 17563 ]

Nowlan Freese made changes - 02/May/23 8:52 AM

Link

This issue relates to ~~IGBF-753~~ [ ~~IGBF-753~~ ]

Nowlan Freese made changes - 02/May/23 8:52 AM

Epic Link

IGBF-1478 [ 17563 ]

IGBF-1395 [ 17470 ]

Nowlan Freese made changes - 02/May/23 8:53 AM

Epic Link

IGBF-1395 [ 17470 ]

IGBF-1478 [ 17563 ]

Nowlan Freese made changes - 02/May/23 8:54 AM

Link

This issue relates to ~~IGBF-2952~~ [ ~~IGBF-2952~~ ]

Hide

Permalink

Ann Loraine added a comment - 15/May/23 10:07 AM

For this, add a new genome "setup" to our subversion (like git) data repository.

Show

Ann Loraine added a comment - 15/May/23 10:07 AM For this, add a new genome "setup" to our subversion (like git) data repository.

Molly Davis made changes - 16/May/23 10:05 AM

Assignee

Molly Davis [ molly ]

Molly Davis made changes - 16/May/23 10:10 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Molly Davis made changes - 17/May/23 2:36 PM

Status

In Progress [ 3 ]

To-Do [ 10305 ]

Ann Loraine made changes - 25/May/23 11:44 AM

Sprint

Summer 1 2023 May 15 [ 170 ]

Ann Loraine made changes - 25/May/23 11:44 AM

Assignee

Molly Davis [ molly ]

Ann Loraine made changes - 25/May/23 11:44 AM

Epic Link

IGBF-1478 [ 17563 ]

IGBF-1765 [ 17855 ]

Ann Loraine made changes - 25/May/23 11:44 AM

Sprint

Summer 2 2023 May 29 [ 171 ]

Nowlan Freese made changes - 05/Jun/23 11:55 AM

Status

To-Do [ 10305 ]

In Progress [ 3 ]

Nowlan Freese made changes - 05/Jun/23 11:55 AM

Assignee

Nowlan Freese [ nfreese ]

Hide

Permalink

Nowlan Freese added a comment - 06/Jun/23 4:41 PM - edited

Downloaded the Jun. 2020 GRCm39/mm39 mm39.2bit file from https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/

Had to use ftp to get the gene2accession.gz file, there is some issue with trying to gunzip -c an ftp file, so I just pulled the whole file. I was able to download the Mus_musculus.gene_info file with no issues, but it is small.

ftp ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
get gene2accession.gz

NCBI:txid10090 for mouse:

gunzip -c gene2accession.gz | grep '^10090\t' > ~/Desktop/jiraIssues/3330/10090.gene2accession.txt

I pulled the refGene and ncbiRefSeq files as bed files. UCSC table browser.
I then created bed14 files for the refGene and ncbiRefSeq files.

ucscToBedDetail.py -a 10090.gene2accession.txt -g Mus_musculus.gene_info mm39_refGene.bed.gz M_musculus_Jun_2020_refGene.bed
ucscToBedDetail.py -a 10090.gene2accession.txt -g Mus_musculus.gene_info mm39_ncbiRefSeq.bed.gz M_musculus_Jun_2020_ncbiRefSeq.bed

I then sorted the two bed files and tabix indexed them. I was getting a lot of errors with tabix (version 1.17) where it was complaining about coordinates being zero and did I forget to add the -0 flag to indicate zero based coordinates. I don't remember this being an issue before, but since bed files are 0-based I added the flag.

sort -k1,1 -k2,2n M_musculus_Jun_2020_refGene.bed | bgzip > M_musculus_Jun_2020_refGene.bed.gz
tabix -0 -s 1 -b 2 -e 3 M_musculus_Jun_2020_refGene.bed.gz

sort -k1,1 -k2,2n M_musculus_Jun_2020_ncbiRefSeq.bed | bgzip > M_musculus_Jun_2020_ncbiRefSeq.bed.gz
tabix -0 -s 1 -b 2 -e 3 M_musculus_Jun_2020_ncbiRefSeq.bed.gz

The all_mRNA, and all_est files I pulled as psl files by selecting the output format as all fields from selected table. Once downloaded, I used the following code:

gunzip \-c M_musculus_Jun_2020_all_est.psl.gz | grep -v bin | cut -f2- > M_musculus_Jun_2020_all_est.psl
sort -k14,14 -k16,16n M_musculus_Jun_2020_all_est.psl > sorted.psl
mv sorted.psl M_musculus_Jun_2020_all_est.psl
bgzip M_musculus_Jun_2020_all_est.psl
tabix -s 14 -b 16 -0 M_musculus_Jun_2020_all_est.psl.gz

gunzip \-c M_musculus_Jun_2020_all_mrna.psl.gz | grep -v bin | cut -f2- > M_musculus_Jun_2020_all_mrna.psl
sort -k14,14 -k16,16n M_musculus_Jun_2020_all_mrna.psl > sorted.psl
mv sorted.psl M_musculus_Jun_2020_all_mrna.psl
bgzip M_musculus_Jun_2020_all_mrna.psl
tabix -s 14 -b 16 -0 M_musculus_Jun_2020_all_mrna.psl.gz

Show

Nowlan Freese added a comment - 06/Jun/23 4:41 PM - edited Downloaded the Jun. 2020 GRCm39/mm39 mm39.2bit file from https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/ Had to use ftp to get the gene2accession.gz file, there is some issue with trying to gunzip -c an ftp file, so I just pulled the whole file. I was able to download the Mus_musculus.gene_info file with no issues, but it is small. ftp ftp: //ftp.ncbi.nlm.nih.gov/gene/DATA/ get gene2accession.gz NCBI:txid10090 for mouse: gunzip -c gene2accession.gz | grep '^10090\t' > ~/Desktop/jiraIssues/3330/10090.gene2accession.txt I pulled the refGene and ncbiRefSeq files as bed files. UCSC table browser . I then created bed14 files for the refGene and ncbiRefSeq files. ucscToBedDetail.py -a 10090.gene2accession.txt -g Mus_musculus.gene_info mm39_refGene.bed.gz M_musculus_Jun_2020_refGene.bed ucscToBedDetail.py -a 10090.gene2accession.txt -g Mus_musculus.gene_info mm39_ncbiRefSeq.bed.gz M_musculus_Jun_2020_ncbiRefSeq.bed I then sorted the two bed files and tabix indexed them. I was getting a lot of errors with tabix (version 1.17) where it was complaining about coordinates being zero and did I forget to add the -0 flag to indicate zero based coordinates. I don't remember this being an issue before, but since bed files are 0-based I added the flag. sort -k1,1 -k2,2n M_musculus_Jun_2020_refGene.bed | bgzip > M_musculus_Jun_2020_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 M_musculus_Jun_2020_refGene.bed.gz sort -k1,1 -k2,2n M_musculus_Jun_2020_ncbiRefSeq.bed | bgzip > M_musculus_Jun_2020_ncbiRefSeq.bed.gz tabix -0 -s 1 -b 2 -e 3 M_musculus_Jun_2020_ncbiRefSeq.bed.gz The all_mRNA, and all_est files I pulled as psl files by selecting the output format as all fields from selected table. Once downloaded, I used the following code: gunzip \-c M_musculus_Jun_2020_all_est.psl.gz | grep -v bin | cut -f2- > M_musculus_Jun_2020_all_est.psl sort -k14,14 -k16,16n M_musculus_Jun_2020_all_est.psl > sorted.psl mv sorted.psl M_musculus_Jun_2020_all_est.psl bgzip M_musculus_Jun_2020_all_est.psl tabix -s 14 -b 16 -0 M_musculus_Jun_2020_all_est.psl.gz gunzip \-c M_musculus_Jun_2020_all_mrna.psl.gz | grep -v bin | cut -f2- > M_musculus_Jun_2020_all_mrna.psl sort -k14,14 -k16,16n M_musculus_Jun_2020_all_mrna.psl > sorted.psl mv sorted.psl M_musculus_Jun_2020_all_mrna.psl bgzip M_musculus_Jun_2020_all_mrna.psl tabix -s 14 -b 16 -0 M_musculus_Jun_2020_all_mrna.psl.gz

Hide

Permalink

Nowlan Freese added a comment - 07/Jun/23 10:04 AM - edited

I have placed the DM6 quickload in CyVerse for testing.

To test:
In IGB,

Add https://data.cyverse.org/dav-anon/iplant/home/nowlanf/Mouse_2020/quickload as a new Data Source in IGB (IntegratedGenomeBrowser > Settings > Data Sources > Add...).
Select the Mus musculus Species and the M_musculus_Jun_2020 Genome Version.
The RefGene track should appear and should automatically load data.
Navigate to: chr1:36,405,677-36,408,789
Click Load Sequence.
Residues (ATCG) should load (may take a little while, CyVerse can be slow).
Check that there are no errors in the log.
Under Available Data in the Data Access tab, click the checkbox for mRNA, EST, and NCBI RefSeq
Click Load Data
Gene annotations for each track should load.
Check that there are no errors in the log.
Check that the UCSC (DAS) folder appears under Available Data in the Data Access tab.
Click the checkbox for ncbiRefSeqCurated.
Click Load Data.
Check that there are no errors in the log.

Show

Nowlan Freese added a comment - 07/Jun/23 10:04 AM - edited I have placed the DM6 quickload in CyVerse for testing. To test: In IGB, Add https://data.cyverse.org/dav-anon/iplant/home/nowlanf/Mouse_2020/quickload as a new Data Source in IGB (IntegratedGenomeBrowser > Settings > Data Sources > Add...). Select the Mus musculus Species and the M_musculus_Jun_2020 Genome Version. The RefGene track should appear and should automatically load data. Navigate to: chr1:36,405,677-36,408,789 Click Load Sequence. Residues (ATCG) should load (may take a little while, CyVerse can be slow). Check that there are no errors in the log. Under Available Data in the Data Access tab, click the checkbox for mRNA, EST, and NCBI RefSeq Click Load Data Gene annotations for each track should load. Check that there are no errors in the log. Check that the UCSC (DAS) folder appears under Available Data in the Data Access tab. Click the checkbox for ncbiRefSeqCurated. Click Load Data. Check that there are no errors in the log.

Nowlan Freese made changes - 07/Jun/23 1:13 PM

Link

This issue relates to ~~IGBF-1617~~ [ ~~IGBF-1617~~ ]

Hide

Permalink

Nowlan Freese added a comment - 08/Jun/23 11:01 AM - edited

Other genomes that should be updated based on UCSC:
rat - ~~IGBF-3362~~
chicken - IGBF-3361

Show

Nowlan Freese added a comment - 08/Jun/23 11:01 AM - edited Other genomes that should be updated based on UCSC: rat - IGBF-3362 chicken - IGBF-3361

Nowlan Freese made changes - 08/Jun/23 11:03 AM

Link

This issue relates to IGBF-3361 [ IGBF-3361 ]

Nowlan Freese made changes - 08/Jun/23 11:04 AM

Link

This issue relates to ~~IGBF-3362~~ [ ~~IGBF-3362~~ ]

Hide

Permalink

Nowlan Freese added a comment - 08/Jun/23 11:47 AM - edited

I noticed that the psl files (EST and mRNA) have some kind of off by one error. The listed end position is +1 compared to where the annotations actually end (positive strand alignments). For example, if a mRNA on the positive strand says that it starts at 100 and ends at 150, in IGB the annotation will be drawn overlapping bases 100 - 149. If on the negative strand the IGB reported start is +1 to where the annotation is actually drawn. This indicates that the tEnd (https://genome.ucsc.edu/FAQ/FAQformat.html#format2) column is effectively +1 compared to what is actually drawn in IGB. Note that the annotations in IGB appear to be drawn in the correct location, i.e. intron exon boundaries line up correctly. The only issue is that the IGB reported start/end is off by one, depending on positive/negative strand.

This issue is present in older psl files (M_musculus_Dec_2011) and on older versions of IGB (tested 9.0.2).

As this issue is not new or specific to this ticket, I have created IGBF-3363 to address it.

Show

Nowlan Freese added a comment - 08/Jun/23 11:47 AM - edited I noticed that the psl files (EST and mRNA) have some kind of off by one error. The listed end position is +1 compared to where the annotations actually end (positive strand alignments). For example, if a mRNA on the positive strand says that it starts at 100 and ends at 150, in IGB the annotation will be drawn overlapping bases 100 - 149. If on the negative strand the IGB reported start is +1 to where the annotation is actually drawn. This indicates that the tEnd ( https://genome.ucsc.edu/FAQ/FAQformat.html#format2 ) column is effectively +1 compared to what is actually drawn in IGB. Note that the annotations in IGB appear to be drawn in the correct location, i.e. intron exon boundaries line up correctly. The only issue is that the IGB reported start/end is off by one, depending on positive/negative strand. This issue is present in older psl files (M_musculus_Dec_2011) and on older versions of IGB (tested 9.0.2). As this issue is not new or specific to this ticket, I have created IGBF-3363 to address it.

Nowlan Freese made changes - 08/Jun/23 11:51 AM

Link

This issue relates to IGBF-3363 [ IGBF-3363 ]

Nowlan Freese made changes - 08/Jun/23 11:56 AM

Assignee

Nowlan Freese [ nfreese ]

Nowlan Freese made changes - 08/Jun/23 11:56 AM

Status

In Progress [ 3 ]

Needs 1st Level Review [ 10005 ]

Ann Loraine made changes - 12/Jun/23 11:14 AM

Sprint

Summer 2 2023 May 29 [ 171 ]

Summer 2 2023 May 29, Summer 3 2023 June 12 [ 171, 172 ]

Ann Loraine made changes - 12/Jun/23 11:14 AM

Rank

Ranked higher

Molly Davis made changes - 23/Jun/23 2:28 PM

Assignee

Molly Davis [ molly ]

Ann Loraine made changes - 26/Jun/23 9:59 AM

Sprint

Summer 2 2023 May 29, Summer 3 2023 June 12 [ 171, 172 ]

Summer 2 2023 May 29, Summer 3 2023 June 12, Summer 4 2023 June 26 [ 171, 172, 173 ]

Ann Loraine made changes - 26/Jun/23 9:59 AM

Rank

Ranked higher

Molly Davis made changes - 26/Jun/23 10:09 AM

Status

Needs 1st Level Review [ 10005 ]

First Level Review in Progress [ 10301 ]

Molly Davis made changes - 26/Jun/23 2:48 PM

Attachment

IGB_Mus-musculus.pdf [ 17908 ]

Hide

Permalink

Molly Davis added a comment - 26/Jun/23 2:49 PM - edited

Review: I was able to get through each step of the instructions and ran into no errors. The following is an image of my screen after finishing.

IGB_Mus-musculus.pdf

If the image is correct than the ticket is ready to move forward!

Show

Molly Davis added a comment - 26/Jun/23 2:49 PM - edited Review : I was able to get through each step of the instructions and ran into no errors. The following is an image of my screen after finishing. IGB_Mus-musculus.pdf If the image is correct than the ticket is ready to move forward!

Molly Davis made changes - 26/Jun/23 2:49 PM

Assignee

Molly Davis [ molly ]

Nowlan Freese [ nfreese ]

Molly Davis made changes - 26/Jun/23 2:49 PM

Status

First Level Review in Progress [ 10301 ]

Needs 1st Level Review [ 10005 ]

Nowlan Freese made changes - 27/Jun/23 10:29 AM

Status

Needs 1st Level Review [ 10005 ]

First Level Review in Progress [ 10301 ]

Nowlan Freese made changes - 27/Jun/23 10:29 AM

Status

First Level Review in Progress [ 10301 ]

Ready for Pull Request [ 10304 ]

Hide

Permalink

Nowlan Freese added a comment - 29/Jun/23 2:48 PM - edited

Notes from pushing changes to subversion repository: https://svn.bioviz.org/viewvc/genomes/quickload/M_musculus_Jun_2020/

Comment for subversion update
~~IGBF-3330~~: Add Mus musculus Jun 2020 genome;Jun 29, 2023

Need to update the following files:
contents.txt -> M_musculus_Jun_2020 Mus musculus (Jun 2020) mouse (GRCm39/mm39)
.htaccess -> AddDescription "Mus musculus (Jun 2020) mouse (GRCm39/mm39)" M_musculus_Jun_2020

Basic steps to checking in files and adding new rat folder with files:

Modify the contents.txt and .htaccess files

Drag the mouse folder into the quickload folder
then:

svn status
svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023" quickload/.htaccess
svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023" quickload/contents.txt
svn add quickload/M_musculus_Jun_2020
svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023"

Show

Nowlan Freese added a comment - 29/Jun/23 2:48 PM - edited Notes from pushing changes to subversion repository: https://svn.bioviz.org/viewvc/genomes/quickload/M_musculus_Jun_2020/ Comment for subversion update IGBF-3330 : Add Mus musculus Jun 2020 genome;Jun 29, 2023 Need to update the following files: contents.txt -> M_musculus_Jun_2020 Mus musculus (Jun 2020) mouse (GRCm39/mm39) .htaccess -> AddDescription "Mus musculus (Jun 2020) mouse (GRCm39/mm39)" M_musculus_Jun_2020 Basic steps to checking in files and adding new rat folder with files: Modify the contents.txt and .htaccess files Drag the mouse folder into the quickload folder then: svn status svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023" quickload/.htaccess svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023" quickload/contents.txt svn add quickload/M_musculus_Jun_2020 svn ci -m "IGBF-3330: Add Mus musculus Jun 2020 genome;Jun 29, 2023"

Hide

Permalink

Nowlan Freese added a comment - 29/Jun/23 2:48 PM

New mouse genome has been added to subversion repository.

Next step, deploy to quickload site.

Show

Nowlan Freese added a comment - 29/Jun/23 2:48 PM New mouse genome has been added to subversion repository. Next step, deploy to quickload site.

Nowlan Freese made changes - 29/Jun/23 2:49 PM

Assignee

Nowlan Freese [ nfreese ]

Nowlan Freese made changes - 29/Jun/23 2:49 PM

Status

Ready for Pull Request [ 10304 ]

Pull Request Submitted [ 10101 ]

Nowlan Freese made changes - 30/Jun/23 10:45 AM

Assignee

Ann Loraine [ aloraine ]

Ann Loraine made changes - 07/Jul/23 3:23 PM

Status

Pull Request Submitted [ 10101 ]

Reviewing Pull Request [ 10303 ]

Ann Loraine made changes - 07/Jul/23 3:23 PM

Status

Reviewing Pull Request [ 10303 ]

Merged Needs Testing [ 10002 ]

Hide

Permalink

Ann Loraine added a comment - 07/Jul/23 3:24 PM - edited

Data are deployed. For details on where the data are deployed, and how to test, see
linked ticket titled "Add rn7 rat genome to IGB".

Show

Ann Loraine added a comment - 07/Jul/23 3:24 PM - edited Data are deployed. For details on where the data are deployed, and how to test, see linked ticket titled " Add rn7 rat genome to IGB ".

Ann Loraine made changes - 07/Jul/23 3:25 PM

Assignee

Ann Loraine [ aloraine ]

Nowlan Freese [ nfreese ]

Hide

Permalink

Nowlan Freese added a comment - 07/Jul/23 4:28 PM

Tested on Mac on IGB release (9.1.10).
Able to load all data (RefGene, NCBI RefSeq, mRNA, EST, sequence file) on scidas and igbquickload.org

Closing ticket.

Show

Nowlan Freese added a comment - 07/Jul/23 4:28 PM Tested on Mac on IGB release (9.1.10). Able to load all data (RefGene, NCBI RefSeq, mRNA, EST, sequence file) on scidas and igbquickload.org Closing ticket.

Nowlan Freese made changes - 07/Jul/23 4:29 PM

Status

Merged Needs Testing [ 10002 ]

Post-merge Testing In Progress [ 10003 ]

Nowlan Freese made changes - 07/Jul/23 4:29 PM

Resolution		Done [ 10000 ]
Status	Post-merge Testing In Progress [ 10003 ]	Closed [ 6 ]

People

Assignee:

Nowlan Freese

Reporter:

Nowlan Freese

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

02/May/23 8:49 AM

Updated:

31/Aug/23 12:57 PM

Resolved:

07/Jul/23 4:29 PM