get_phenotypes: Retrieve phenotypes by entity, quality, taxon, and study

View source: R/phenotypes.R

get_phenotypesR Documentation

Retrieve phenotypes by entity, quality, taxon, and study


Retrieves "semantic phenotypes", i.e., phenotypes encoded as ontological expressions. Filtering is possible by anatomical entity (optionally including entities related by certain properties, see includeRels), phenotypic quality, taxonomic group where the phenotypes have been recorded, and study (a.k.a. publication).


  entity = NA,
  quality = NA,
  taxon = NA,
  study = NA,
  includeRels = c("part of"),
  .withTaxon = FALSE,
  verbose = FALSE



character, the anatomical entity by which to filter, if any.


character, the phenotypic quality by which to filter, if any.


character, the taxon by which to filter, if any.


character, the identifier of the study by which to filter, if any.


character or vector of characters. The names of relationships for anatomical entities to include in addition to subtype (rdfs:subClassOf). Defaults to "part of". Set to FALSE to not include any additional relationships. Otherwise one or more of "part of", "historical homologous to", and "serially homologous to", or set to TRUE to include all possible ones. It is acceptable to use unambiguous prefixes, for example "historical".


logical, whether to include taxa in the result if taxon is provided. If TRUE, only the combination of phenotype and taxon will be unique in the returned data frame. Default is FALSE, meaning by default providing a value for taxon only acts as another filter but does not change format or redundancy of the result. Ignored if taxon is not provided as a character value.


logical, whether to print messages informing about potentially time-consuming operations. Default is FALSE.


Entity, quality, and taxon can be given as IRI or as name (i.e, term label). In the latter case, names will be resolved to IRIs against anatomy ontologies, PATO, and taxonomy ontologies, respectively. Warnings will be issued if only a partial match can be found. The study must be given as IRI.


A data frame with columns "id" and "label".

If a character value for taxon was provided, and .withTaxon is TRUE’, columns "" and "taxon.label" will be returned as well. While (phenotypes, taxon) tuples will be unique, both phenotypes and taxa individually will then be redundant in the returned data frame (the association is n:n).


phens1 <- get_phenotypes(entity = "pelvic fin")

# by default, parts are already included
phens2 <- get_phenotypes(entity = "pelvic fin", includeRels = c("part"))
nrow(phens1) == nrow(phens2)
table(phens2$id %in% phens1$id)

# but historical homologues are not
phens2 <- get_phenotypes(entity = "pelvic fin", includeRels = c("part", "hist"))
table(phens2$id %in% phens1$id)

# neither are serially homologous
phens2 <- get_phenotypes(entity = "pelvic fin", includeRels = TRUE)
table(phens2$id %in% phens1$id)

# filter also by quality
phens2 <- get_phenotypes(entity = "pelvic fin", quality = "shape")
table(phens1$id %in% phens2$id)

# filter also by quality and taxon
phens2 <- get_phenotypes(entity = "pelvic fin", quality = "shape", taxon = "Siluriformes")
table(phens1$id %in% phens2$id)

# filter by entity, quality and taxon, and return taxa as well (resulting in
# (phenotype, taxon) "tuples")
phens2a <- get_phenotypes(entity = "pelvic fin", quality = "shape", taxon = "Siluriformes",
                          .withTaxon = TRUE)
nrow(phens2a) - nrow(phens2) # lots of redundancy due to n:n relationship
nrow(unique(phens2a[,c("id", "label")])) == nrow(phens2) # but same #phenotypes

# can compute and visualize similarity
sm <- jaccard_similarity(terms = phens2$id, .labels = phens2$label, .colnames = "label")

xu-hong/rphenoscape documentation built on Oct. 30, 2024, 8:43 a.m.