Return the frequency count of k-grams in a k-gram frequency table, or whether words are contained in a dictionary.
1 2 3 4 5 6 7
a character vector. A list of k-grams if
This generic has slightly different behaviors when querying
for the presence of words in a dictionary and for k-gram counts
in a frequency table respectively.
query() looks for exact matches between the input and the
dictionary entries. Queries of Begin-Of-Sentence (
EOS()) tokens always return
TRUE, and queries
of the Unknown-Word token return
On the other hand, queries of k-gram counts first perform a word level
tokenization, so that anything separated by one or more space characters
in the input is considered as a single word (thus, for instance queries of
strings such as
"i love you",
" i love you"), or
"i love you ") all produce the same outcome). Moreover,
querying for any word outside the underlying dictionary returns the counts
corresponding to the Unknown-Word token (
UNK()) (e.g., if
"prcsrn" is outside the dictionary, querying
"i love prcsrn" is the same as querying
paste("i love", UNK())). Queries from k-grams of order
k > N
A subsetting equivalent of query, with synthax
object[x] is available
(see the examples).
query(object, x). The query of the empty string
"" returns the
total count of words, including the
UNK tokens, but not
See also the examples below.
an integer vector, containing k-gram counts of
object is a
kgram_freqs class object, a logical vector if
object is a
dictionary. Vectorized over
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Querying a k-gram frequency table f <- kgram_freqs("a a b a b b a b", N = 2) query(f, c("a", "b")) # query single words query(f, c("a b")) # query a 2-gram identical(query(f, "c"), query(f, "d")) # TRUE, both "c" and "d" are <UNK> identical(query(f, UNK()), query(f, "c")) # TRUE query(f, EOS()) # 1, since text is a single sentence f[c("b b", "b")] # query with subsetting synthax f[""] # 9 (includes the EOS token) # Querying a dictionary d <- as_dictionary(c("a", "b")) query(d, c("a", "b", "c")) # query some words query(d, c(BOS(), EOS(), UNK())) # c(TRUE, TRUE, FALSE) d["a"] # query with subsetting synthax
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.