term_freqs | R Documentation |
Determines the frequencies for the given input list of terms, based on the selected corpus and the type (category) of the terms.
term_freqs(
x,
as = c("phenotype", "entity", "anatomical_entity", "quality"),
corpus = c("taxon-variation", "annotated-taxa", "taxon-annotations", "states",
"gene-annotations", "genes"),
decodeIRI = FALSE,
...
)
x |
a vector or list of one or more terms, either as IRIs or as term objects. |
as |
the category or categories (a.k.a. type) of the input terms (see Note that at present, support by the KB API for "quality" remains pending and has thus been disabled as of v0.3.0. Also, mixing different categories of terms is not yet supported, and doing so will thus raise an error. |
corpus |
the name of the corpus for determining how to count, currently one of the following:
Unambiguous abbreviations of corpus names are acceptable. The default is "taxon-variation". Note that at present "taxon-annotations" and "gene-annotations" are not yet supported by the KB API and will thus result in an error. Note that previously "taxa" was allowed as a corpus, but is no longer supported. The "taxon-variation" corpus is the equivalent of the deprecated "taxa" corpus. |
decodeIRI |
boolean. This parameter is deprecated (as of v0.3.x) and must be set to FALSE (the default). If TRUE is passed an error will be raised. In v0.2.x when TRUE this parameter would attempt to decode post-composed entity IRIs. Due to changes in the IRI returned by the Phenoscape KB v2.x API decoding post-composed entity IRIs is no longer possible. Prior to v0.3.x, the default value for this parameter was TRUE. |
... |
additional query parameters to be passed to the function querying
for counts, see |
Depending on the corpus selected, the frequencies are queried directly from pre-computed counts through the KB API, or are calculated based on matching row counts obtained from query results. Currently, the Phenoscape KB has precomputed counts for corpora "annotated-taxa","taxon-variation", "states", and "genes".
a vector of frequencies as floating point numbers (between zero and 1.0), of the same length (and ordering) as the input list of terms.
Term categories being accurate is vital for obtaining correct counts and
thus frequencies. In earlier (<=0.2.x) releases, auto-determining term
category was an option, but this is no longer supported, in part because it
was potentially time consuming and often inaccurate, in particular for
the many post-composed subsumer terms returned by subsumer_matrix()
. In the
KB v2.0 API, auto-determining the category of a post-composed term is no
longer supported. If the list of terms is legitimately of different categories,
determine (and possibly correct) categories beforehand using term_category()
.
In earlier (<=0.2.x) releases one supported corpus was "taxon_annotations", albeit its implementation was very slow and potentially inaccurate because it relied on potentially multiple individudal KB API queries for each term, and this in turn relied on the ability to break down post-composed expressions into their component terms and expressions, which is (at least currently) no longer possible.
phens <- get_phenotypes(entity = "basihyal bone")
# see which phenotypes we have:
phens$label
# frequencies by counting taxa:
freqs.t <- term_freqs(phens$id, as = "phenotype", corpus = "taxon-variation")
freqs.t
# we can convert this to absolute counts:
freqs.t * corpus_size("taxon-variation")
# frequencies by counting character states:
freqs.s <- term_freqs(phens$id, as = "phenotype", corpus = "states")
freqs.s
# and as absolute counts:
freqs.s * corpus_size("states")
# we can compare the absolute counts by computing a ratio
freqs.s * corpus_size("states") / (freqs.t * corpus_size("taxon-variation"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.