[IGBF-1298] species.txt vs synonyms.txt - JIRA UNCC

Details

Type: Documentation
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Labels:
None

Story Points:
0.5
Epic Link:
Document
Sprint:
Summer 2018 Part 2

Description

This page is unclear about the difference between species.txt vs synonyms.txt.

https://wiki.transvar.org/display/igbman/Use+synonyms.txt+to+link+genome+version+names+to+each+other

It also points to a file in the repo that is not there.

Attachments

Issue Links

relates to

IGBF-1262 Create release testing module for synonyms

Closed

IGBF-1229 file location of synonyms.txt in user guide

Closed

Activity

Ascending order - Click to sort in descending order

Hide

Permalink

Ann Loraine added a comment - 15/Jun/18 11:38 AM

Update as required.

Show

Ann Loraine added a comment - 15/Jun/18 11:38 AM Update as required.

Hide

Permalink

Ann Loraine added a comment - 05/Jul/18 2:09 AM

See https://wiki.transvar.org/display/igbman/Use+species.txt+to+link+species+names+to+IGB+genome+names

Show

Ann Loraine added a comment - 05/Jul/18 2:09 AM See https://wiki.transvar.org/display/igbman/Use+species.txt+to+link+species+names+to+IGB+genome+names

Hide

Permalink

Ann Loraine added a comment - 05/Jul/18 2:54 PM

Look at Mason Meyer's recent comments. Recently he wrote about this - possibly as comment for one of the issues. If you can't find it, call or text.

Show

Ann Loraine added a comment - 05/Jul/18 2:54 PM Look at Mason Meyer's recent comments. Recently he wrote about this - possibly as comment for one of the issues. If you can't find it, call or text.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 05/Jul/18 2:57 PM

I think you are thinking of Mason's comment on issue: ~~IGBF-1262~~

That makes one mention of the species.txt document, and highlights how it is distinct from chromosome.txt; but does not clarify the difference between species.txt and synonyms.txt.

Show

Ivory Blakley (Inactive) added a comment - 05/Jul/18 2:57 PM I think you are thinking of Mason's comment on issue: IGBF-1262 That makes one mention of the species.txt document, and highlights how it is distinct from chromosome.txt; but does not clarify the difference between species.txt and synonyms.txt.

Hide

Permalink

Ivory Blakley (Inactive) added a comment - 06/Jul/18 11:22 AM

In IGB, data is displayed relative to a genome version. The term "genome" might refer to an individual's genome (John Smith's genome) or the aggregate genetic pool of a species (Homo sapiens genome), but a genome version refers to an exact set of sequences (H_sapiens_Dec_2013), usually associated with a particular publication, institution and/or time. In IGB, the naming convention for genome versions is <first letter of genus><species><month published>_<year published>.

A genome version is an exact thing. It can have up to one latin species name, and up to one common name. It can have any number of synonyms--names for the same genome version. These are specified in a document called species.txt, which has this tab-delimited format:
Column 1: binomial (Latin) names for species
Column 2: common name for the species
Column 3: IGB-friendly genome version name prefix (e.g., H_sapiens or A_gambiae)
Column 4, 5, etc. (optional): Genome version name (synonyms)

In this, column 3 is the key (identity) column. Columns 1 and 2 may have repeats (and we should expect a repeat in one to also be repeat in the other). Columns 4-end should not include any repeats.

As a supplement to the species.txt file, IGB also reads a synonyms.txt file, which is the same format, minus the first two columns. The synonyms file includes the genome version name in the first column (what was was column 3 in species.txt) and then any number of synonyms.

If you have data representing reads that were aligned to the H_sapiens_Dec_2013 genome, you'll want to display the data with the H_sapiens_Dec_2013 sequences, not the H_sapiens_Feb_2009 sequences. If you setting up a quickload site, you would put the files in a folder called "H_sapiens_Dec_2013" and include that folder name in the contents.txt file. Suppose you are sending the data from Galaxy, and you are constrained to using whatever name Galaxy uses to refer to that genome version--perhaps "Homo sapiens"; you create a synonyms file to link "Homo sapiens" data from Galaxy to the "H_sapiens_Dec_2013" genome version in IGB, like this:
H_sapiens_Dec_2013<tab>Homo sapiens

That will tell IGB that any data from "Homo sapiens" should be shown in the H_sapiens_Dec_2013 genome. IGB will still use its existing latin name and common name for the H_sapiens_Dec_2013 genome version.

Show

Ivory Blakley (Inactive) added a comment - 06/Jul/18 11:22 AM In IGB, data is displayed relative to a genome version. The term "genome" might refer to an individual's genome (John Smith's genome) or the aggregate genetic pool of a species (Homo sapiens genome), but a genome version refers to an exact set of sequences (H_sapiens_Dec_2013), usually associated with a particular publication, institution and/or time. In IGB, the naming convention for genome versions is <first letter of genus> <species> <month published>_<year published>. A genome version is an exact thing. It can have up to one latin species name, and up to one common name. It can have any number of synonyms--names for the same genome version. These are specified in a document called species.txt, which has this tab-delimited format: Column 1: binomial (Latin) names for species Column 2: common name for the species Column 3: IGB-friendly genome version name prefix (e.g., H_sapiens or A_gambiae) Column 4, 5, etc. (optional): Genome version name (synonyms) In this, column 3 is the key (identity) column. Columns 1 and 2 may have repeats (and we should expect a repeat in one to also be repeat in the other). Columns 4-end should not include any repeats. As a supplement to the species.txt file, IGB also reads a synonyms.txt file, which is the same format, minus the first two columns. The synonyms file includes the genome version name in the first column (what was was column 3 in species.txt) and then any number of synonyms. If you have data representing reads that were aligned to the H_sapiens_Dec_2013 genome, you'll want to display the data with the H_sapiens_Dec_2013 sequences, not the H_sapiens_Feb_2009 sequences. If you setting up a quickload site, you would put the files in a folder called "H_sapiens_Dec_2013" and include that folder name in the contents.txt file. Suppose you are sending the data from Galaxy, and you are constrained to using whatever name Galaxy uses to refer to that genome version--perhaps "Homo sapiens"; you create a synonyms file to link "Homo sapiens" data from Galaxy to the "H_sapiens_Dec_2013" genome version in IGB, like this: H_sapiens_Dec_2013<tab>Homo sapiens That will tell IGB that any data from "Homo sapiens" should be shown in the H_sapiens_Dec_2013 genome. IGB will still use its existing latin name and common name for the H_sapiens_Dec_2013 genome version.

species.txt vs synonyms.txt

Details

Description

Attachments

Issue Links

Activity

People

Dates