textmodel_mlp: Multilayer perceptron network (MLP) model for text...

View source: R/textmodel_mlp.R

textmodel_mlpR Documentation

Multilayer perceptron network (MLP) model for text classification

Description

This function is a wrapper for a multilayer perceptron network model with a single hidden layer network with two layers, implemented in the keras package.

Usage

textmodel_mlp(
  x,
  y,
  units = 512,
  dropout = 0.2,
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "categorical_accuracy",
  ...
)

Arguments

x

the dfm on which the model will be fit. Does not need to contain only the training documents.

y

vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)

units

The number of network nodes used in the first layer of the sequential model

dropout

A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs.

optimizer

optimizer used to fit model to training data, see keras::compile.keras.engine.training.Model()

loss

objective loss function, see keras::compile.keras.engine.training.Model()

metrics

metric used to train algorithm, see keras::compile.keras.engine.training.Model()

...

additional options passed to keras::fit.keras.engine.training.Model()

See Also

save.textmodel_mlp(), load.textmodel_mlp()

Examples

## Not run: 
# create a dataset with evenly balanced coded and uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
    corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
    corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded

# form a tf-idf-weighted dfm
dfmat <- dfm(corp) %>%
    dfm_tfidf()

set.seed(1000)
tmod <- textmodel_mlp(dfmat, y = docvars(dfmat, "crowd_immigration_label"),
                        epochs = 5, verbose = 1)
pred <- predict(tmod, newdata = dfm_subset(dfmat, is.na(crowd_immigration_label)))
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)

## End(Not run)

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.