extract_eupath_orthologs: Given 2 species names from the eupathdb, make orthology...

View source: R/extract_eupath_orthologs.R

extract_eupath_orthologsR Documentation

Given 2 species names from the eupathdb, make orthology tables betwixt them.

Description

The eupathdb provides such a tremendous wealth of information. For me though, it is difficult sometimes to boil it down into just the bits of comparison I want for 1 species or between 2 species. A singularly common question I am asked is: "What are the most similar genes between species x and y among these two arbitrary parasites?" There are lots of ways to poke at this question: run BLAST/fasta36, use biomart, query the ortholog tables from the eupathdb, etc. However, in all these cases, it is not trivial to ask the next question: What about: a:b and b:a? This function attempts to address that for the case of two eupath species from the same domain. (tritrypdb/fungidb/etc.) It does however assume that the sqlite package has been installed locally, if not it suggests you run the make_organismdbi function in order to do that.

Usage

extract_eupath_orthologs(
  db,
  master = "GID",
  query_species = NULL,
  id_column = "ORTHOLOGS_GID",
  org_column = "ORTHOLOGS_ORGANISM",
  group_column = "ANNOT_GENE_ORTHOMCL_NAME",
  name_column = "ORTHOLOGS_PRODUCT",
  count_column = "ORTHOLOGS_COUNT",
  print_speciesnames = FALSE,
  webservice = "eupathdb"
)

Arguments

db

Species name (subset) from one eupath database.

master

Primary keytype to use for indexing the various tables.

query_species

A list of exact species names to search for. If uncertain about them, add print_speciesnames=TRUE and be ready for a big blob of text. If left null, then it will pull all species.

id_column

What column in the database provides the set of ortholog IDs?

org_column

What column provides the species name?

group_column

Ortholog group column name.

name_column

Name of the gene for this group.

count_column

Name of the column with the count of species represented.

print_speciesnames

Dump the species names for diagnostics?

webservice

Which eupathdb project to query?

Details

One other important caveat: this function assumes queries in the format 'table_column' where in this particular instance, the table is further assumed to be the ortholog table.

Value

A big table of orthoMCL families, the columns are:

  1. GID: The gene ID

  2. ORTHOLOG_ID: The gene ID of the associated ortholog.

  3. ORTHOLOG_SPECIES: The species of the associated ortholog.

  4. ORTHOLOG_URL: The OrthoMCL group ID's URL.

  5. ORTHOLOG_COUNT: The number of all genes from all species represented in this group.

  6. ORTHOLOG_GROUP: The family ID

  7. QUERIES_IN_GROUP: How many of the query species are represented in this group?

  8. GROUP_REPRESENTATION: ORTHOLOG_COUNT / the number of possible species.

Author(s)

atb


khughitt/EuPathDB documentation built on Nov. 4, 2023, 4:19 a.m.