query | R Documentation |
Return the frequency count of k-grams in a k-gram frequency table, or whether words are contained in a dictionary.
query(object, x)
## S3 method for class 'kgram_freqs'
query(object, x)
## S3 method for class 'kgrams_dictionary'
query(object, x)
object |
a |
x |
a character vector. A list of k-grams if |
This generic has slightly different behaviors when querying
for the presence of words in a dictionary and for k-gram counts
in a frequency table respectively.
For words, query()
looks for exact matches between the input and the
dictionary entries. Queries of Begin-Of-Sentence (BOS()
) and
End-Of-Sentence (EOS()
) tokens always return TRUE
, and queries
of the Unknown-Word token return FALSE
(see special_tokens).
On the other hand, queries of k-gram counts first perform a word level
tokenization, so that anything separated by one or more space characters
in the input is considered as a single word (thus, for instance queries of
strings such as "i love you"
, " i love you"
), or
"i love you "
) all produce the same outcome). Moreover,
querying for any word outside the underlying dictionary returns the counts
corresponding to the Unknown-Word token (UNK()
) (e.g., if
the word "prcsrn"
is outside the dictionary, querying
"i love prcsrn"
is the same as querying
paste("i love", UNK())
). Queries from k-grams of order k > N
will return NA
.
A subsetting equivalent of query, with synthax object[x]
is available
(see the examples).
query(object, x)
. The query of the empty string ""
returns the
total count of words, including the EOS
and UNK
tokens, but not
the BOS
token.
See also the examples below.
an integer vector, containing k-gram counts of x
, if
object
is a kgram_freqs
class object, a logical vector if
object
is a dictionary
. Vectorized over x
.
Valerio Gherardi
# Querying a k-gram frequency table
f <- kgram_freqs("a a b a b b a b", N = 2)
query(f, c("a", "b")) # query single words
query(f, c("a b")) # query a 2-gram
identical(query(f, "c"), query(f, "d")) # TRUE, both "c" and "d" are <UNK>
identical(query(f, UNK()), query(f, "c")) # TRUE
query(f, EOS()) # 1, since text is a single sentence
f[c("b b", "b")] # query with subsetting synthax
f[""] # 9 (includes the EOS token)
# Querying a dictionary
d <- as_dictionary(c("a", "b"))
query(d, c("a", "b", "c")) # query some words
query(d, c(BOS(), EOS(), UNK())) # c(TRUE, TRUE, FALSE)
d["a"] # query with subsetting synthax
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.