Details

    • Type: Task
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Task: Investigate the Ensembl API at https://rest.ensembl.org/ to determine if the API will return data that can be used by IGB, such as genomic sequence and gene annotations.

        Attachments

          Activity

          Show
          nfreese Nowlan Freese added a comment - Sequence: https://rest.ensembl.org/documentation/info/sequence_id
          Hide
          nfreese Nowlan Freese added a comment - - edited

          Lists all available species, their aliases, available adaptor groups and data release: https://rest.ensembl.org/documentation/info/species
          Example: https://rest.ensembl.org/info/species

          Show
          nfreese Nowlan Freese added a comment - - edited Lists all available species, their aliases, available adaptor groups and data release: https://rest.ensembl.org/documentation/info/species Example: https://rest.ensembl.org/info/species
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited

           Read the documentation, went through a few APIs, and found the below useful information

          GET info/divisions
          Get the list of all Ensembl divisions for which information is available: https://rest.ensembl.org/documentation/info/info_divisions
          Example: https://rest.ensembl.org/info/divisions?content-type=application/json

          GET info/species
          Lists all available species, their aliases, available adaptor groups, and data release.
          Has to specify a division optional parameter to get the species for that ensembl division.
          API to get the list of all species: https://rest.ensembl.org/documentation/info/species
          Example: https://rest.ensembl.org/info/species?content-type=application/json
          https://rest.ensembl.org/info/species?content-type=application/json;division=EnsemblPlants (default division is EnsemblVertebrates)
          Note: No version info present in the response

          GET info/assembly/:species
          API to get the list of chromosomes for a selected species: https://rest.ensembl.org/documentation/info/assembly_info
          Example: https://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json
          Note: response has three array variables - one is top_level_region which has an array of objects, each object has a name, length, and coord_system, this looks similar to the list of chromosomes with the length and name but the coord_system has different values, two of them are scaffold and chromosome which brings the doubt of whether we consider all of them as chromosomes are not? Also, didn't quite understand what the other two arrays intend to describe.
          Additional notes: Karyotype is defined as an individual’s complete set of chromosomes, so can this be considered as the list of chromosomes for that species?

          GET info/genomes/:genome_name
          Find information about a given genome
          API to get more information about the genome: https://rest.ensembl.org/documentation/info/info_genome
          Example: https://rest.ensembl.org/info/genomes/homo_sapiens?content-type=application/json
          Note: genebuild in the response looks similar to genomeVersionName

          GET info/assembly/:species/:region_name
          Returns information about the specified top-level sequence region for the given species: https://rest.ensembl.org/documentation/info/assembly_stats
          Example: https://rest.ensembl.org/info/assembly/homo_sapiens/X?content-type=application/json
          https://rest.ensembl.org/info/assembly/homo_sapiens/KI270757.1?content-type=application/json
          Note: Both the assemblies mentioned in the example are taken from the get assembly info API but in the first one's response the is_chromosome variable is 1 (means true) where as for the second one it is 0 (means false), does this imply that the coord_system variable in the assembly API response says whether that object should be considered a chromosome or not?

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited  Read the documentation, went through a few APIs, and found the below useful information GET info/divisions Get the list of all Ensembl divisions for which information is available:  https://rest.ensembl.org/documentation/info/info_divisions Example: https://rest.ensembl.org/info/divisions?content-type=application/json GET info/species Lists all available species, their aliases, available adaptor groups, and data release. Has to specify a division optional parameter to get the species for that ensembl division. API to get the list of all species: https://rest.ensembl.org/documentation/info/species Example: https://rest.ensembl.org/info/species?content-type=application/json https://rest.ensembl.org/info/species?content-type=application/json;division=EnsemblPlants (default division is EnsemblVertebrates) Note: No version info present in the response GET info/assembly/:species API to get the list of chromosomes for a selected species: https://rest.ensembl.org/documentation/info/assembly_info Example:  https://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json Note: response has three array variables - one is top_level_region which has an array of objects, each object has a name, length, and coord_system, this looks similar to the list of chromosomes with the length and name but the coord_system has different values, two of them are scaffold and chromosome which brings the doubt of whether we consider all of them as chromosomes are not? Also, didn't quite understand what the other two arrays intend to describe. Additional notes: Karyotype is defined as an individual’s complete set of chromosomes, so can this be considered as the list of chromosomes for that species? GET info/genomes/:genome_name Find information about a given genome API to get more information about the genome:  https://rest.ensembl.org/documentation/info/info_genome Example:  https://rest.ensembl.org/info/genomes/homo_sapiens?content-type=application/json Note: genebuild in the response looks similar to genomeVersionName GET info/assembly/:species/:region_name Returns information about the specified top-level sequence region for the given species:  https://rest.ensembl.org/documentation/info/assembly_stats Example:  https://rest.ensembl.org/info/assembly/homo_sapiens/X?content-type=application/json https://rest.ensembl.org/info/assembly/homo_sapiens/KI270757.1?content-type=application/json Note: Both the assemblies mentioned in the example are taken from the get assembly info API but in the first one's response the is_chromosome variable is 1 (means true) where as for the second one it is 0 (means false), does this imply that the coord_system variable in the assembly API response says whether that object should be considered a chromosome or not?
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited

          Information about other APIs:

          GET sequence/id/:id
          Request multiple types of sequence by stable identifier. Supports feature masking and expand options.
          API to get the sequence: https://rest.ensembl.org/documentation/info/sequence_id
          Example: https://rest.ensembl.org/sequence/id/GENSCAN00000000001?object_type=predictiontranscript;db_type=core;content-type=application/json;type=protein;species=homo_sapiens
          Note: The id present in the path argument is an Ensembl stable ID Not sure from where we can get this

          GET sequence/region/:species/:region
          Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options.
          API to get the sequence for a specifies region: https://rest.ensembl.org/documentation/info/sequence_region
          Example: https://rest.ensembl.org/sequence/region/human/X:1000..26000:-1?content-type=application/json
          Note: the maximum allowed length to request the sequence for is 10000000

          GET alignment/region/:species/:region
          Retrieves genomic alignments as separate blocks based on a region and species
          API to get the alignment for the specified region: https://rest.ensembl.org/documentation/info/genomic_alignment_region
          Example: https://rest.ensembl.org/alignment/region/homo_sapiens/1:0-100000?content-type=application/json;species_set_group=primates;display_species_set=homo_sapiens
          Note: This looks like the region API which we use to get the information about the glyphs (or Syms) but the response isn’t similar to any known file types and has the sequence in it with dashes in between, also the parameters we give to the api are kind of complicated, not sure how we decide which species_set_group does the species belong to, we do have other API from where we can get the species_set_groups and the species that belong to that set group:
          GET info/compara/species_sets/:method
          List all collections of species analysed with the specified compara method: https://rest.ensembl.org/documentation/info/compara_species_sets
          Example: https://rest.ensembl.org/info/compara/species_sets/EPO?content-type=application/json
          GET info/compara/methods
          List all compara analyses available (an analysis defines the type of comparative data): https://rest.ensembl.org/documentation/info/compara_methods
          Example: https://rest.ensembl.org/info/compara/methods/?content-type=application/json
          but even this is a bit complicated.

          GET info/variation/:species
          List the variation sources used in Ensembl for a species: https://rest.ensembl.org/documentation/info/variation
          Example: https://rest.ensembl.org/info/variation/homo_sapiens?content-type=application/json
          Note: Have to check whether this information is useful or not.

          GET lookup/symbol/:species/:symbol
          Find the species and database for a symbol in a linked external database: https://rest.ensembl.org/documentation/info/symbol_lookup
          Example: https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRCA2?content-type=application/json;expand=1
          Note: this api response has exons and other stuff and it looks more like the get region API than the before mentioned API

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - - edited Information about other APIs: GET sequence/id/:id Request multiple types of sequence by stable identifier. Supports feature masking and expand options. API to get the sequence: https://rest.ensembl.org/documentation/info/sequence_id Example: https://rest.ensembl.org/sequence/id/GENSCAN00000000001?object_type=predictiontranscript;db_type=core;content-type=application/json;type=protein;species=homo_sapiens Note: The id present in the path argument is an Ensembl stable ID Not sure from where we can get this GET sequence/region/:species/:region Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options. API to get the sequence for a specifies region: https://rest.ensembl.org/documentation/info/sequence_region Example: https://rest.ensembl.org/sequence/region/human/X:1000..26000:-1?content-type=application/json Note: the maximum allowed length to request the sequence for is 10000000 GET alignment/region/:species/:region Retrieves genomic alignments as separate blocks based on a region and species API to get the alignment for the specified region: https://rest.ensembl.org/documentation/info/genomic_alignment_region Example: https://rest.ensembl.org/alignment/region/homo_sapiens/1:0-100000?content-type=application/json;species_set_group=primates;display_species_set=homo_sapiens Note: This looks like the region API which we use to get the information about the glyphs (or Syms) but the response isn’t similar to any known file types and has the sequence in it with dashes in between, also the parameters we give to the api are kind of complicated, not sure how we decide which species_set_group does the species belong to, we do have other API from where we can get the species_set_groups and the species that belong to that set group: GET info/compara/species_sets/:method List all collections of species analysed with the specified compara method: https://rest.ensembl.org/documentation/info/compara_species_sets Example: https://rest.ensembl.org/info/compara/species_sets/EPO?content-type=application/json GET info/compara/methods List all compara analyses available (an analysis defines the type of comparative data): https://rest.ensembl.org/documentation/info/compara_methods Example: https://rest.ensembl.org/info/compara/methods/?content-type=application/json but even this is a bit complicated. GET info/variation/:species List the variation sources used in Ensembl for a species: https://rest.ensembl.org/documentation/info/variation Example: https://rest.ensembl.org/info/variation/homo_sapiens?content-type=application/json Note: Have to check whether this information is useful or not. GET lookup/symbol/:species/:symbol Find the species and database for a symbol in a linked external database: https://rest.ensembl.org/documentation/info/symbol_lookup Example: https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRCA2?content-type=application/json;expand=1 Note: this api response has exons and other stuff and it looks more like the get region API than the before mentioned API
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          After discussing with Dr.Freese, we finalised the below API calls:

          To get the list of species (genomes) available in Ensembl:
          GET info/divisions
          Get the list of all Ensembl divisions for which information is available: https://rest.ensembl.org/documentation/info/info_divisions
          Example: https://rest.ensembl.org/info/divisions?content-type=application/json

          GET info/species
          Lists all available species, their aliases, available adaptor groups, and data release.
          To get the species for that ensembl division, have to specify the division optional parameter.
          API to get the list of all species: https://rest.ensembl.org/documentation/info/species
          Example: https://rest.ensembl.org/info/species?content-type=application/json
          Use name, display_name, and accession from the API response as the species name, tooltip description, and version respectively.

          To get the chromosome list:
          GET info/assembly/:species
          API to get the list of chromosomes for a selected species: https://rest.ensembl.org/documentation/info/assembly_info
          Example: https://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json
          the top_level_region has all the chromosomes, name and length would be enough to get the assembly info.

          To get the sequence for a particular region of a chromosome:
          GET sequence/region/:species/:region
          Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options.
          API to get the sequence for a specifies region: https://rest.ensembl.org/documentation/info/sequence_region
          Example: https://rest.ensembl.org/sequence/region/human/X:1000..26000:-1?content-type=application/json
          The maximum allowed length to request the sequence is 10000000, so we have to do multiple API calls by dividing the large length into smaller lengths (use multithreading to optimize it)

          To get the annotation for a particular region of a chromosome, we think we have to use the below API but still have to investigate on this. Nowlan Freese will be looking into this.
          GET lookup/symbol/:species/:symbol
          Find the species and database for a symbol in a linked external database: https://rest.ensembl.org/documentation/info/symbol_lookup
          Example: https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRCA2?content-type=application/json;expand=1

          Jaya Sravani Sirigineedi - Need to look into the existing logic of showing the species list and understand it.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - After discussing with Dr.Freese, we finalised the below API calls: To get the list of species (genomes) available in Ensembl: GET info/divisions Get the list of all Ensembl divisions for which information is available: https://rest.ensembl.org/documentation/info/info_divisions Example: https://rest.ensembl.org/info/divisions?content-type=application/json GET info/species Lists all available species, their aliases, available adaptor groups, and data release. To get the species for that ensembl division, have to specify the division optional parameter. API to get the list of all species: https://rest.ensembl.org/documentation/info/species Example: https://rest.ensembl.org/info/species?content-type=application/json Use name, display_name, and accession from the API response as the species name, tooltip description, and version respectively. To get the chromosome list: GET info/assembly/:species API to get the list of chromosomes for a selected species: https://rest.ensembl.org/documentation/info/assembly_info Example: https://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json the top_level_region has all the chromosomes, name and length would be enough to get the assembly info. To get the sequence for a particular region of a chromosome: GET sequence/region/:species/:region Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options. API to get the sequence for a specifies region: https://rest.ensembl.org/documentation/info/sequence_region Example: https://rest.ensembl.org/sequence/region/human/X:1000..26000:-1?content-type=application/json The maximum allowed length to request the sequence is 10000000, so we have to do multiple API calls by dividing the large length into smaller lengths (use multithreading to optimize it) To get the annotation for a particular region of a chromosome, we think we have to use the below API but still have to investigate on this. Nowlan Freese will be looking into this. GET lookup/symbol/:species/:symbol Find the species and database for a symbol in a linked external database: https://rest.ensembl.org/documentation/info/symbol_lookup Example: https://rest.ensembl.org/lookup/symbol/homo_sapiens/BRCA2?content-type=application/json;expand=1 Jaya Sravani Sirigineedi - Need to look into the existing logic of showing the species list and understand it.
          Hide
          nfreese Nowlan Freese added a comment - - edited

          Jaya Sravani Sirigineedi - this endpoint looks like it should be able to give us the annotation (gene) info: https://rest.ensembl.org/documentation/info/overlap_region

          Example: https://rest.ensembl.org/overlap/region/human/1:26169859-26170921?content-type=application/json;feature=gene;feature=transcript;feature=cds;feature=exon

          I'm not sure which features we will need. We may need several features to get all of the info we need.

          Example above can be viewed in IGB for the hg38 (H_sapiens_Dec_2013) genome at chr1:26,169,859-26,170,921

          Show
          nfreese Nowlan Freese added a comment - - edited Jaya Sravani Sirigineedi - this endpoint looks like it should be able to give us the annotation (gene) info: https://rest.ensembl.org/documentation/info/overlap_region Example: https://rest.ensembl.org/overlap/region/human/1:26169859-26170921?content-type=application/json;feature=gene;feature=transcript;feature=cds;feature=exon I'm not sure which features we will need. We may need several features to get all of the info we need. Example above can be viewed in IGB for the hg38 (H_sapiens_Dec_2013) genome at chr1:26,169,859-26,170,921
          Hide
          jsirigin Jaya Sravani Sirigineedi added a comment -

          Looked into that API, as discussed with Nowlan Freese this API is the only one that matches our requirements and gives the expected response. The response is a bit different when compared to UCSC, so it should be parsed differently need a new parser and Symmetry class for this. Closing this ticket as we got all the APIs that we need for the development.

          Show
          jsirigin Jaya Sravani Sirigineedi added a comment - Looked into that API, as discussed with Nowlan Freese this API is the only one that matches our requirements and gives the expected response. The response is a bit different when compared to UCSC, so it should be parsed differently need a new parser and Symmetry class for this. Closing this ticket as we got all the APIs that we need for the development.

            People

            • Assignee:
              jsirigin Jaya Sravani Sirigineedi
              Reporter:
              nfreese Nowlan Freese
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: