subsumer_matrix: Obtains a subsumer matrix

View source: R/semsim.R

subsumer_matrixR Documentation

Obtains a subsumer matrix

Description

A subsumer matrix M for terms j \in \{1, \dots, n\} has value M_{i,j}=1 iff class i (which can be an anonymous class expression) subsumes term j, and zero otherwise. Therefore, it will have n columns, one for each term.

Usage

subsumer_matrix(
  terms,
  .colnames = c("ID", "IRI", "label"),
  .labels = NULL,
  preserveOrder = FALSE,
  verbose = FALSE
)

Arguments

terms

character, the list of terms for which to compute the dependency matrix. Can be given as term IRIs or term labels, and the list can contain both. Terms given as labels will first be resolved to IRIs, assuming they are from an anatomy ontology.

.colnames

character, how to name the columns of the resulting matrix.

  • "ID" (the default): use the term IDs (the last component of the term IRIs).

  • "IRI": use the term IRIs.

  • "label": use the terms' labels (see .labels parameter).

.labels

character, the labels for terms where known. Only used if .colnames = "label". If NULL (the default), labels will be looked up if terms are provided as IRIs; elements of the terms list that are not in IRI form are assumed to be the label. If a list, must have the same length and ordering as terms; any NA elements will be looked up (from the corresponding term IRI).

preserveOrder

logical, whether to return columns in the same order as terms. The default is not to preserve the order.

verbose

logical, whether to print informative messages about certain potentially time-consuming operations.

Details

In this implementation, for each row i \sum_{j=1}^{n}M_{i,j} > 0. That is, each row will have at least one non-zero value, which means that the number of classes not subsuming a term will be highly incomplete, because the (usually very many) classes not subsuming any of the terms will not be included. This subsumer matrix is thus only useful for similarity metrics for which non-subsuming terms can be ignored.

Value

A data.frame representing the subsumer matrix

The matrix will have additional attributes depending on the choice of how to name rows and columns. If .colnames = "ID" (the default), the matrix will have an attribute prefixes giving the URL prefixes removed from the term IRIs to yield the IDs, in the order of the rows. If .colnames = "label", it will have attribute term.iris, giving the term IRIs for the rows (and columns). Note that these extra attributes will be lost upon subsetting the returned matrix.

Examples

tl <- c("http://purl.obolibrary.org/obo/UBERON_0000981",
        "http://purl.obolibrary.org/obo/UBERON_0002103",
        "http://purl.obolibrary.org/obo/UBERON_0000976",
        "http://purl.obolibrary.org/obo/UBERON_0002102")
m <- subsumer_matrix(tl)
m <- # term IDs as column names
id_prefixes <- attr(m, "prefixes")
id_prefixes # 4x "http://purl.obolibrary.org/obo/"

m <- subsumer_matrix(tl, .colnames = "label")
m # term labels as column names
mat_terms <- attr(m, "term.iris")
mat_terms # term IRIs in the same order as columns

xu-hong/rphenoscape documentation built on Jan. 28, 2024, 12:22 p.m.