[IGBF-4010] Create documentation for adding NCBI genomes to IGB Quickload - JIRA UNCC

Details

Type: Documentation
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
2
Epic Link:
Improve IGB for users
Sprint:
Winter 1, Spring 1

Description

Situation: We currently have documentation for adding new genomes to IGB Quickload using UCSC as a data source. However, after working hard on the IGB-UCSC integration, future requests for adding new genomes to IGB will have to be completed using other data sources such as NCBI. Since this is an entirely new data source and the process for downloading data and naming files will look different, we need to create some new documentation.

Task:

Read over current documentation for adding new genomes to IGB Quickload (https://docs.google.com/document/d/1WQO_HWhpfUBntsNSaQ-jdrVoR6jJcQ7ewJ_HFvpYsYA/edit?usp=sharing).
Using our current documentation as a guide, create a new document in Google Drive, create section outlines, and transfer over as much relevant documentation as possible.
Include a section that discusses how to name files depending on the assembly type (e.g., "ncbiRefSeq" when dealing with a RefSeq assembly, "genBank" when dealing with a GenBank (GCA) assembly).
Include a section with an example annots.xml file that discusses how to format the description attribute depending on the assembly type (e.g., "NCBI GenBank [GenBank (GCA) assembly] [Assembly] ([Assembly date in MMM. DD, YYYY format])" when dealing with a GenBank (GCA) assembly), as well as the title attribute.
Include an example HEADER.md file that has been remade for genomes coming from NCBI rather than UCSC.
Include a section for creating species.txt and synonyms.txt.

Attachments

Issue Links

relates to

IGBF-4018 Add Dama dama genome to IGB

Closed

Activity

Descending order - Click to sort in ascending order

Hide

Permalink

Paige Kulzer added a comment - 21/Jan/25 3:42 PM

These are great suggestions!

I've added some more instructions to the Tabix-index gene model file section for manually checking the BED file for any erroneous text. I also removed mention of archiving files from that section. Finally, I specified that ALL modified files should be included in the zipped Quickload folder for a reviewer to test (i.e., species.txt, synonyms.txt, etc) in the Deploy to IGB Quickload section.

Closing this ticket!

Show

Paige Kulzer added a comment - 21/Jan/25 3:42 PM These are great suggestions! I've added some more instructions to the Tabix-index gene model file section for manually checking the BED file for any erroneous text. I also removed mention of archiving files from that section. Finally, I specified that ALL modified files should be included in the zipped Quickload folder for a reviewer to test (i.e., species.txt, synonyms.txt, etc) in the Deploy to IGB Quickload section. Closing this ticket!

Hide

Permalink

Nowlan Freese added a comment - 21/Jan/25 2:30 PM

Overall looks good. I had a couple of thoughts. Up to Paige Kulzer to decide if changes should be made. Otherwise I think the ticket can be closed.

Convert annotation features -> just making a note about how it doesn't seem like we need to do the steps to add the 13/14th columns to the bed file like we used to when we were pulling annotations from UCSC. I checked the GFF for human hg38 from NCBI it looks like the data for the 13th/14th columns is present in the file. This is great and saves us a good amount of time.
Tabix-index gene model file -> I don't archive the original BED12 file. I don't think there's a reason to save it?
Based on testing of ~~IGBF-4018~~ I'm wondering if we should include some text on making sure to check the conversion of the bed file for things like %2C.

Show

Nowlan Freese added a comment - 21/Jan/25 2:30 PM Overall looks good. I had a couple of thoughts. Up to Paige Kulzer to decide if changes should be made. Otherwise I think the ticket can be closed. Convert annotation features -> just making a note about how it doesn't seem like we need to do the steps to add the 13/14th columns to the bed file like we used to when we were pulling annotations from UCSC. I checked the GFF for human hg38 from NCBI it looks like the data for the 13th/14th columns is present in the file. This is great and saves us a good amount of time. Tabix-index gene model file -> I don't archive the original BED12 file. I don't think there's a reason to save it? Based on testing of IGBF-4018 I'm wondering if we should include some text on making sure to check the conversion of the bed file for things like %2C.

Hide

Permalink

Paige Kulzer added a comment - 14/Jan/25 11:16 AM - edited

To test the new version of the documentation I made, I went ahead and followed this documentation to add the Dama dama genome to IGB Quickload as part of ~~IGBF-4018~~.

For review, please look over the documentation I've made and check the following:

The order of steps makes sense
The images are clear and easy to read, and add to the reader's comprehension of the task
The naming system for genomes with RefSeq vs GenBank annotations is clearly defined and consistent
Code is all formatted consistently
The new Dama dama quickload contains all of the necessary files and works as expected when added to IGB (i.e., was created successfully by following this documentation)

Show

Paige Kulzer added a comment - 14/Jan/25 11:16 AM - edited To test the new version of the documentation I made, I went ahead and followed this documentation to add the Dama dama genome to IGB Quickload as part of IGBF-4018 . For review, please look over the documentation I've made and check the following: The order of steps makes sense The images are clear and easy to read, and add to the reader's comprehension of the task The naming system for genomes with RefSeq vs GenBank annotations is clearly defined and consistent Code is all formatted consistently The new Dama dama quickload contains all of the necessary files and works as expected when added to IGB (i.e., was created successfully by following this documentation)

Hide

Permalink

Paige Kulzer added a comment - 08/Jan/25 3:32 PM - edited

Here's a link to the new documentation: How we add new genomes to IGB Quickload using NCBI as a data source

Show

Paige Kulzer added a comment - 08/Jan/25 3:32 PM - edited Here's a link to the new documentation: How we add new genomes to IGB Quickload using NCBI as a data source

Create documentation for adding NCBI genomes to IGB Quickload

Details

Description

Attachments

Issue Links

Activity

People

Dates