| pattern | R Documentation |
Pattern(s) for use in matching features, tokens, and keywords through a valuetype pattern.
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
The pattern argument is a vector of patterns, including
sequences, to match in a target object, whose match type is specified by
valuetype. Note that an empty pattern ("") will match
"padding" in a tokens object.
characterA character vector of token patterns to be selected
or removed. Whitespace is not privileged, so that in a character vector,
white space is interpreted literally. If you wish to consider
whitespace-separated elements as sequences of tokens, wrap the argument in
phrase().
list of character objectsIf the list elements are character
vectors of length 1, then this is equivalent to a vector of characters. If
a list element contains a vector of characters longer than length 1, then
for matching will consider these as sequences of matches, equivalent to
wrapping the argument in phrase(), except for matching to
dfm features where this does not apply.
dictionaryValues in dictionary are used as patterns,
for literal matches. Multi-word values are automatically converted into
phrases, so performing selection or compounding using a dictionary is the
same as wrapping the dictionary in phrase().
collocationsCollocations objects created from
quanteda.textstats::textstat_collocations(), which are treated as phrases
automatically.
valuetype, case_insensitive
# these are interpreted literally
(patt1 <- c("president", "white house", "house of representatives"))
# as multi-word sequences
phrase(patt1)
# three single-word patterns
(patt2 <- c("president", "white_house", "house_of_representatives"))
phrase(patt2)
# this is equivalent to phrase(patt1)
(patt3 <- list(c("president"), c("white", "house"),
c("house", "of", "representatives")))
# glob expression can be used
phrase(patt4 <- c("president?", "white house", "house * representatives"))
# this is equivalent to phrase(patt4)
(patt5 <- list(c("president?"), c("white", "house"), c("house", "*", "representatives")))
# dictionary with multi-word matches
(dict1 <- dictionary(list(us = c("president", "white house", "house of representatives"))))
phrase(dict1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.