Details

    • Type: Documentation
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      0.5
    • Epic Link:
    • Sprint:
      Summer 2018 Part 2

      Description

      This page is unclear about the difference between species.txt vs synonyms.txt.

      https://wiki.transvar.org/display/igbman/Use+synonyms.txt+to+link+genome+version+names+to+each+other

      It also points to a file in the repo that is not there.

        Attachments

          Issue Links

            Activity

            Hide
            ann.loraine Ann Loraine added a comment -

            Update as required.

            Show
            ann.loraine Ann Loraine added a comment - Update as required.
            Show
            ann.loraine Ann Loraine added a comment - See https://wiki.transvar.org/display/igbman/Use+species.txt+to+link+species+names+to+IGB+genome+names
            Hide
            ann.loraine Ann Loraine added a comment -

            Look at Mason Meyer's recent comments. Recently he wrote about this - possibly as comment for one of the issues. If you can't find it, call or text.

            Show
            ann.loraine Ann Loraine added a comment - Look at Mason Meyer's recent comments. Recently he wrote about this - possibly as comment for one of the issues. If you can't find it, call or text.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            I think you are thinking of Mason's comment on issue: IGBF-1262

            That makes one mention of the species.txt document, and highlights how it is distinct from chromosome.txt; but does not clarify the difference between species.txt and synonyms.txt.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - I think you are thinking of Mason's comment on issue: IGBF-1262 That makes one mention of the species.txt document, and highlights how it is distinct from chromosome.txt; but does not clarify the difference between species.txt and synonyms.txt.
            Hide
            ieclabau Ivory Blakley (Inactive) added a comment -

            In IGB, data is displayed relative to a genome version. The term "genome" might refer to an individual's genome (John Smith's genome) or the aggregate genetic pool of a species (Homo sapiens genome), but a genome version refers to an exact set of sequences (H_sapiens_Dec_2013), usually associated with a particular publication, institution and/or time. In IGB, the naming convention for genome versions is <first letter of genus><species><month published>_<year published>.

            A genome version is an exact thing. It can have up to one latin species name, and up to one common name. It can have any number of synonyms--names for the same genome version. These are specified in a document called species.txt, which has this tab-delimited format:
            Column 1: binomial (Latin) names for species
            Column 2: common name for the species
            Column 3: IGB-friendly genome version name prefix (e.g., H_sapiens or A_gambiae)
            Column 4, 5, etc. (optional): Genome version name (synonyms)

            In this, column 3 is the key (identity) column. Columns 1 and 2 may have repeats (and we should expect a repeat in one to also be repeat in the other). Columns 4-end should not include any repeats.

            As a supplement to the species.txt file, IGB also reads a synonyms.txt file, which is the same format, minus the first two columns. The synonyms file includes the genome version name in the first column (what was was column 3 in species.txt) and then any number of synonyms.

            If you have data representing reads that were aligned to the H_sapiens_Dec_2013 genome, you'll want to display the data with the H_sapiens_Dec_2013 sequences, not the H_sapiens_Feb_2009 sequences. If you setting up a quickload site, you would put the files in a folder called "H_sapiens_Dec_2013" and include that folder name in the contents.txt file. Suppose you are sending the data from Galaxy, and you are constrained to using whatever name Galaxy uses to refer to that genome version--perhaps "Homo sapiens"; you create a synonyms file to link "Homo sapiens" data from Galaxy to the "H_sapiens_Dec_2013" genome version in IGB, like this:
            H_sapiens_Dec_2013<tab>Homo sapiens

            That will tell IGB that any data from "Homo sapiens" should be shown in the H_sapiens_Dec_2013 genome. IGB will still use its existing latin name and common name for the H_sapiens_Dec_2013 genome version.

            Show
            ieclabau Ivory Blakley (Inactive) added a comment - In IGB, data is displayed relative to a genome version. The term "genome" might refer to an individual's genome (John Smith's genome) or the aggregate genetic pool of a species (Homo sapiens genome), but a genome version refers to an exact set of sequences (H_sapiens_Dec_2013), usually associated with a particular publication, institution and/or time. In IGB, the naming convention for genome versions is <first letter of genus> <species> <month published>_<year published>. A genome version is an exact thing. It can have up to one latin species name, and up to one common name. It can have any number of synonyms--names for the same genome version. These are specified in a document called species.txt, which has this tab-delimited format: Column 1: binomial (Latin) names for species Column 2: common name for the species Column 3: IGB-friendly genome version name prefix (e.g., H_sapiens or A_gambiae) Column 4, 5, etc. (optional): Genome version name (synonyms) In this, column 3 is the key (identity) column. Columns 1 and 2 may have repeats (and we should expect a repeat in one to also be repeat in the other). Columns 4-end should not include any repeats. As a supplement to the species.txt file, IGB also reads a synonyms.txt file, which is the same format, minus the first two columns. The synonyms file includes the genome version name in the first column (what was was column 3 in species.txt) and then any number of synonyms. If you have data representing reads that were aligned to the H_sapiens_Dec_2013 genome, you'll want to display the data with the H_sapiens_Dec_2013 sequences, not the H_sapiens_Feb_2009 sequences. If you setting up a quickload site, you would put the files in a folder called "H_sapiens_Dec_2013" and include that folder name in the contents.txt file. Suppose you are sending the data from Galaxy, and you are constrained to using whatever name Galaxy uses to refer to that genome version--perhaps "Homo sapiens"; you create a synonyms file to link "Homo sapiens" data from Galaxy to the "H_sapiens_Dec_2013" genome version in IGB, like this: H_sapiens_Dec_2013<tab>Homo sapiens That will tell IGB that any data from "Homo sapiens" should be shown in the H_sapiens_Dec_2013 genome. IGB will still use its existing latin name and common name for the H_sapiens_Dec_2013 genome version.

              People

              • Assignee:
                ieclabau Ivory Blakley (Inactive)
                Reporter:
                ieclabau Ivory Blakley (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: