Description Usage Arguments Details Value Author(s) Examples
Adds new feature columns, either user-supplied or based on keyword(s)/regex pattern search, to
a provided sento_corpus
or a quanteda corpus
object.
1 2 3 4 5 6 7 | add_features(
corpus,
featuresdf = NULL,
keywords = NULL,
do.binary = TRUE,
do.regex = FALSE
)
|
corpus |
a |
featuresdf |
a named |
keywords |
a named |
do.binary |
a |
do.regex |
a |
If a provided feature name is already part of the corpus, it will be replaced. The featuresdf
and
keywords
arguments can be provided at the same time, or only one of them, leaving the other at NULL
. We use
the stringi package for searching the keywords. The do.regex
argument points to the corresponding elements
in keywords
. For FALSE
, we transform the keywords into a simple regex expression, involving "\b"
for
exact word boundary matching and (if multiple keywords) |
as OR operator. The elements associated to TRUE
do
not undergo this transformation, and are evaluated as given, if the corresponding keywords vector consists of only one
expression. For a large corpus and/or complex regex patterns, this function may require some patience. Scaling between 0
and 1 is performed via min-max normalization, per column.
An updated corpus
object.
Samuel Borms
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | set.seed(505)
# construct a corpus and add (a) feature(s) to it
corpus <- quanteda::corpus_sample(
sento_corpus(corpusdf = sentometrics::usnews), 500
)
corpus1 <- add_features(corpus,
featuresdf = data.frame(random = runif(quanteda::ndoc(corpus))))
corpus2 <- add_features(corpus,
keywords = list(pres = "president", war = "war"),
do.binary = FALSE)
corpus3 <- add_features(corpus,
keywords = list(pres = c("Obama", "US president")))
corpus4 <- add_features(corpus,
featuresdf = data.frame(all = 1),
keywords = list(pres1 = "Obama|US [p|P]resident",
pres2 = "\\bObama\\b|\\bUS president\\b",
war = "war"),
do.regex = c(TRUE, TRUE, FALSE))
sum(quanteda::docvars(corpus3, "pres")) ==
sum(quanteda::docvars(corpus4, "pres2")) # TRUE
# adding a complementary feature
nonpres <- data.frame(nonpres = as.numeric(!quanteda::docvars(corpus3, "pres")))
corpus3 <- add_features(corpus3, featuresdf = nonpres)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.