Details
-
Type:
Task
-
Status: Closed (View Workflow)
-
Priority:
Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Sprint:Fall 7
Description
Task: Add the Vanessa cardui genome and annotation to IGB.
Link to genome on NCBI: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_905220365.1/
Attachments
Issue Links
- relates to
-
IGBF-4038 Improve documentation description quickload files for painted lady genome assembly
-
- Closed
-
Activity
Below is an outline of the steps I followed to create this Quickload:
1.Use wget to obtain the .2bit file from UCSC's track hub directory, then rename it
wget https://hgdownload.soe.ucsc.edu/hubs/GCF/905/220/365/GCF_905220365.1/GCF_905220365.1.2bit
mv GCF_905220365.1.2bit ilVanCard2.1.2bit
2. Create genome.txt, then check that the chromosome's are ordered logically (i.e., numerically)
./twoBitInfo ilVanCard2.1.2bit genome.txt cat genome.txt
3. Use Vanessa cardui's taxID (171605) to get the information needed from gene2accession.gz and gene_info.gz to create the BED14 file in a later step
gunzip -c gene2accession.gz | grep '^171605\t' > 171605.gene2accession.txt gunzip -c gene_info.gz | grep '^171605\t' > 171605.gene_info.txt
4. Download the RefSeqAll BED file from UCSC's table browser (Link: https://genome.ucsc.edu/cgi-bin/hgTables), then create the BED14 file using the following code:
cd ~/Documents/Repos/genomesource/ ./ucscToBedDetail.py -a ~/Downloads/171605.gene2accession.txt -g ~/Downloads/171605.gene_info.txt ~/Downloads/V_cardui_ncbiRefSeq.bed.gz ~/Downloads/V_cardui_Feb_2021_ncbiRefSeq.bed
5. Sort, gzip, and tabix the BED14 file
cd ~/Downloads/ sort -k1,1 -k2,2n V_cardui_Feb_2021_ncbiRefSeq.bed | bgzip > V_cardui_Feb_2021_ncbiRefSeq.bed.gz tabix -0 -s 1 -b 2 -e 3 V_cardui_Feb_2021_ncbiRefSeq.bed.gz
6. Sanity check the 2bit and BED files - Add the 2bit file as a reference, then drag/drop the BED files into IGB. Confirm that gene models are present, labeled correctly, and that no error messages are present in the Log.
7. Create a new directory in the quickload repo, then create annots.xml
cd ~/Documents/Repos/quickload/ svn mkdir V_cardui_Feb_2021 svn cp A_gambiae_Feb_2003/annots.xml V_cardui_Feb_2021 nano V_cardui_Feb_2021/annots.xml
8. Add V_cardui_Feb_2021 to contents.txt and .htaccess
nano contents.txt
V_cardui_Feb_2021 Vanessa cardui (Feb 2021) painted lady (ilVanCard2.1)
nano .htaccess
AddDescription "Vanessa cardui (Feb 2021) painted lady (ilVanCard2.1)" V_cardui_Feb_2021
9. Create HEADER.md
../genomesource/writeQuickLoadHeaderUCSC.py V_cardui_Feb_2021 > V_cardui_Feb_2021/HEADER.md
I made the recommend ticket for improving the genome assembly documentation. Closing this one now.