build_features: Builds the feature-matrix from a text-vector

Description Usage Arguments Value Examples

View source: R/build_features.R

Description

Builds the feature-matrix from a text-vector

Usage

1
2
build_features(x, term_count_min = 1, mdl = NULL, parallel = TRUE,
  quiet = FALSE)

Arguments

x

a vector of text

term_count_min

a number passed to prune_vocabulary, defaults to 1. In case the function is used for training, it can and should be set to some higher value, i.e., 3.

mdl

is a list of existing models-data (containing the vectorizer, the tfidf, and the lsa object), defaults to NULL, in which case it is rebuild

parallel

T/F if the task should be executed in parallel, defaults to TRUE

quiet

T/F if the function remains silent, defaults to FALSE

Value

a list of two: a dgCMatrix that contains the features (columns) for each text (row) and as a second element a list of the model that can be passed as mdl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
text <- c(
  "This is a first text that describes something",
  "A second Text That USES A LOT of CAPITALS",
  "Lastly MANY!!!! (like, really a lot!) punctuations!!!"
)

build_features(text)

# a second example
train <- c("Banking is finance", "flowers are not houses", "finance is power", "houses are build")
test <- c("finance is greed", "flowers belong in the garbage", "houses are build")

a1 <- build_features(test)
a12 <- build_features(test, mdl = a1$mdl)

a2 <- build_features(train, mdl = a1$mdl)
a2$model_matrix %>% as.matrix()

schliebs/trollR documentation built on May 23, 2019, 2:52 p.m.