dfm_replace: Replace features in dfm

Description Usage Arguments Examples

View source: R/dfm_replace.R

Description

Substitute features based on vectorized one-to-one matching for lemmatization or user-defined stemming.

Usage

1
2
dfm_replace(x, pattern, replacement, case_insensitive = TRUE,
  verbose = quanteda_options("verbose"))

Arguments

x

dfm whose features will be replaced

pattern

a character vector. See pattern for more details.

replacement

if pattern is a character vector, then replacement must be character vector of equal length, for a 1:1 match.

case_insensitive

ignore case when matching, if TRUE

verbose

print status messages if TRUE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
dfmat1 <- dfm(data_corpus_irishbudget2010)

# lemmatization
lis <- c("foci", "focus", "focused", "focuses", "focusing", "focussed", "focusses")
lemma <- rep("focus", length(lis))
dfmat2 <- dfm_replace(dfmat1, pattern = lis, replacement = lemma)
featnames(dfm_select(dfmat2, pattern = lis))

# stemming
feat <- featnames(dfmat1)
featstem <- char_wordstem(feat, "porter")
dfmat3 <- dfm_replace(dfmat1, pattern = feat, replacement = featstem, case_insensitive = FALSE)
identical(dfmat3, dfm_wordstem(dfmat1, "porter"))

quanteda/quanteda documentation built on Feb. 16, 2019, 5:45 a.m.