Details
-
Type: Task
-
Status: Closed (View Workflow)
-
Priority: Minor
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:2
-
Epic Link:
-
Sprint:Summer 4 2023 June 26
Description
Situation: The most recent rat genome (rn7) is not currently available in IGB.
Task: Add the rn7 rat genome to IGB: https://hgdownload.soe.ucsc.edu/downloads.html#rat
Attachments
Issue Links
- relates to
-
IGBF-3330 Add mm39 (GRCm39) mouse genome to IGB
- Closed
Activity
I downloaded the Nov. 2020 (mRatBN7.2/rn7) assembly of the rat genome (rn7) 2bit file from https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/
I pulled the rat info from gene2accession file using NCBI txid 10116
gunzip -c gene2accession.gz | grep '^10116\t' > ~/Desktop/jiraIssues/3362/10116.gene2accession.txt
I created bed 14 files for UCSC refGene and NCBI Ref Seq (ncbiRefSeq):
ucscToBedDetail.py -a 10116.gene2accession.txt -g Rattus_norvegicus.gene_info R_norvegicus_Nov_2020_refGene.bed.gz R_norvegicus_Nov_2020_refGene.bed ucscToBedDetail.py -a 10116.gene2accession.txt -g Rattus_norvegicus.gene_info R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz R_norvegicus_Nov_2020_ncbiRefSeq.bed
I then sorted, bgzipped, and tabix indexed the two bed files:
sort -k1,1 -k2,2n R_norvegicus_Nov_2020_refGene.bed | bgzip > R_norvegicus_Nov_2020_refGene.bed.gz tabix -0 -s 1 -b 2 -e 3 R_norvegicus_Nov_2020_refGene.bed.gz sort -k1,1 -k2,2n R_norvegicus_Nov_2020_ncbiRefSeq.bed | bgzip > R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz tabix -0 -s 1 -b 2 -e 3 R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz
Next I sorted, bgzipped, and tabix indexed the psl files for all_est and all_mrna:
gunzip \-c R_norvegicus_Nov_2020_all_est.psl.gz | grep -v bin | cut -f2- > R_norvegicus_Nov_2020_all_est.psl sort -k14,14 -k16,16n R_norvegicus_Nov_2020_all_est.psl > sorted.psl mv sorted.psl R_norvegicus_Nov_2020_all_est.psl bgzip R_norvegicus_Nov_2020_all_est.psl tabix -s 14 -b 16 -0 R_norvegicus_Nov_2020_all_est.psl.gz gunzip \-c R_norvegicus_Nov_2020_all_mrna.psl.gz | grep -v bin | cut -f2- > R_norvegicus_Nov_2020_all_mrna.psl sort -k14,14 -k16,16n R_norvegicus_Nov_2020_all_mrna.psl > sorted.psl mv sorted.psl R_norvegicus_Nov_2020_all_mrna.psl bgzip R_norvegicus_Nov_2020_all_mrna.psl tabix -s 14 -b 16 -0 R_norvegicus_Nov_2020_all_mrna.psl.gz
Nowlan Freese
added a comment - - edited I downloaded the Nov. 2020 (mRatBN7.2/rn7) assembly of the rat genome (rn7) 2bit file from https://hgdownload.soe.ucsc.edu/goldenPath/rn7/bigZips/
I pulled the rat info from gene2accession file using NCBI txid 10116
gunzip -c gene2accession.gz | grep '^10116\t' > ~/Desktop/jiraIssues/3362/10116.gene2accession.txt
I created bed 14 files for UCSC refGene and NCBI Ref Seq (ncbiRefSeq):
ucscToBedDetail.py -a 10116.gene2accession.txt -g Rattus_norvegicus.gene_info R_norvegicus_Nov_2020_refGene.bed.gz R_norvegicus_Nov_2020_refGene.bed
ucscToBedDetail.py -a 10116.gene2accession.txt -g Rattus_norvegicus.gene_info R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz R_norvegicus_Nov_2020_ncbiRefSeq.bed
I then sorted, bgzipped, and tabix indexed the two bed files:
sort -k1,1 -k2,2n R_norvegicus_Nov_2020_refGene.bed | bgzip > R_norvegicus_Nov_2020_refGene.bed.gz
tabix -0 -s 1 -b 2 -e 3 R_norvegicus_Nov_2020_refGene.bed.gz
sort -k1,1 -k2,2n R_norvegicus_Nov_2020_ncbiRefSeq.bed | bgzip > R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz
tabix -0 -s 1 -b 2 -e 3 R_norvegicus_Nov_2020_ncbiRefSeq.bed.gz
Next I sorted, bgzipped, and tabix indexed the psl files for all_est and all_mrna:
gunzip \-c R_norvegicus_Nov_2020_all_est.psl.gz | grep -v bin | cut -f2- > R_norvegicus_Nov_2020_all_est.psl
sort -k14,14 -k16,16n R_norvegicus_Nov_2020_all_est.psl > sorted.psl
mv sorted.psl R_norvegicus_Nov_2020_all_est.psl
bgzip R_norvegicus_Nov_2020_all_est.psl
tabix -s 14 -b 16 -0 R_norvegicus_Nov_2020_all_est.psl.gz
gunzip \-c R_norvegicus_Nov_2020_all_mrna.psl.gz | grep -v bin | cut -f2- > R_norvegicus_Nov_2020_all_mrna.psl
sort -k14,14 -k16,16n R_norvegicus_Nov_2020_all_mrna.psl > sorted.psl
mv sorted.psl R_norvegicus_Nov_2020_all_mrna.psl
bgzip R_norvegicus_Nov_2020_all_mrna.psl
tabix -s 14 -b 16 -0 R_norvegicus_Nov_2020_all_mrna.psl.gz
Tested on Mac on IGB release (9.1.10).
Able to load all data (RefGene, NCBI RefSeq, mRNA, EST, sequence file) on scidas and igbquickload.org
Closing ticket.