term_freqs: Obtains term frequencies for the Phenoscape KB

View source: R/term-weights.R

term_freqsR Documentation

Obtains term frequencies for the Phenoscape KB

Description

Determines the frequencies for the given input list of terms, based on the selected corpus and the type (category) of the terms.

Usage

term_freqs(
  x,
  as = c("phenotype", "entity", "anatomical_entity", "quality"),
  corpus = c("taxon-variation", "annotated-taxa", "taxon-annotations", "states",
    "gene-annotations", "genes"),
  decodeIRI = FALSE,
  ...
)

Arguments

x

a vector or list of one or more terms, either as IRIs or as term objects.

as

the category or categories (a.k.a. type) of the input terms (see term_category()). Possible values are "anatomical_entity" (synonymous with "entity"), "quality", and "phenotype". Unambiguous abbreviations are acceptable. The value must either be a single category (applying to all terms), or a vector of categories (of same length as x). The default is "phenotype".

Note that at present, support by the KB API for "quality" remains pending and has thus been disabled as of v0.3.0. Also, mixing different categories of terms is not yet supported, and doing so will thus raise an error.

corpus

the name of the corpus for determining how to count, currently one of the following:

  • "states" (counts character states),

  • "taxon-variation" (counts taxa with variation profiles, and thus does not include terminal and other taxa that do not have child taxa with phenotype annotations),

  • "annotated-taxa" (counts taxa with phenotype annotations, and thus primarily those terminal taxa that have annotations),

  • "taxon-annotations" (counts phenotype annotations to character states and thus taxa),

  • "gene-annotations" (counts phenotype annotations to genes or alleles), and

  • "genes" (counts genes)

Unambiguous abbreviations of corpus names are acceptable. The default is "taxon-variation". Note that at present "taxon-annotations" and "gene-annotations" are not yet supported by the KB API and will thus result in an error.

Note that previously "taxa" was allowed as a corpus, but is no longer supported. The "taxon-variation" corpus is the equivalent of the deprecated "taxa" corpus.

decodeIRI

boolean. This parameter is deprecated (as of v0.3.x) and must be set to FALSE (the default). If TRUE is passed an error will be raised. In v0.2.x when TRUE this parameter would attempt to decode post-composed entity IRIs. Due to changes in the IRI returned by the Phenoscape KB v2.x API decoding post-composed entity IRIs is no longer possible. Prior to v0.3.x, the default value for this parameter was TRUE.

...

additional query parameters to be passed to the function querying for counts, see pkb_args_to_query(). This is currently (as of v0.3.0) not used.

Details

Depending on the corpus selected, the frequencies are queried directly from pre-computed counts through the KB API, or are calculated based on matching row counts obtained from query results. Currently, the Phenoscape KB has precomputed counts for corpora "annotated-taxa","taxon-variation", "states", and "genes".

Value

a vector of frequencies as floating point numbers (between zero and 1.0), of the same length (and ordering) as the input list of terms.

Note

Term categories being accurate is vital for obtaining correct counts and thus frequencies. In earlier (<=0.2.x) releases, auto-determining term category was an option, but this is no longer supported, in part because it was potentially time consuming and often inaccurate, in particular for the many post-composed subsumer terms returned by subsumer_matrix(). In the KB v2.0 API, auto-determining the category of a post-composed term is no longer supported. If the list of terms is legitimately of different categories, determine (and possibly correct) categories beforehand using term_category().

In earlier (<=0.2.x) releases one supported corpus was "taxon_annotations", albeit its implementation was very slow and potentially inaccurate because it relied on potentially multiple individudal KB API queries for each term, and this in turn relied on the ability to break down post-composed expressions into their component terms and expressions, which is (at least currently) no longer possible.

Examples

phens <- get_phenotypes(entity = "basihyal bone")
# see which phenotypes we have:
phens$label
# frequencies by counting taxa:
freqs.t <- term_freqs(phens$id, as = "phenotype", corpus = "taxon-variation")
freqs.t
# we can convert this to absolute counts:
freqs.t * corpus_size("taxon-variation")
# frequencies by counting character states:
freqs.s <- term_freqs(phens$id, as = "phenotype", corpus = "states")
freqs.s
# and as absolute counts:
freqs.s * corpus_size("states")
# we can compare the absolute counts by computing a ratio
freqs.s * corpus_size("states") / (freqs.t * corpus_size("taxon-variation"))

xu-hong/rphenoscape documentation built on Jan. 28, 2024, 12:22 p.m.