textmodel_mlp: Multilayer perceptron network (MLP) model for text...
In quanteda/quanteda.classifiers: Models for supervised text classification

textmodel_mlp

R Documentation

Multilayer perceptron network (MLP) model for text classification

Description

This function is a wrapper for a multilayer perceptron network model with a single hidden layer network with two layers, implemented in the keras package.

Usage

textmodel_mlp(
  x,
  y,
  units = 512,
  dropout = 0.2,
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "categorical_accuracy",
  ...
)

Arguments

`x`	the dfm on which the model will be fit. Does not need to contain only the training documents.
`y`	vector of training labels associated with each document identified in `train`. (These will be converted to factors if not already factors.)
`units`	The number of network nodes used in the first layer of the sequential model
`dropout`	A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs.
`optimizer`	optimizer used to fit model to training data, see `keras::compile.keras.engine.training.Model()`
`loss`	objective loss function, see `keras::compile.keras.engine.training.Model()`
`metrics`	metric used to train algorithm, see `keras::compile.keras.engine.training.Model()`
`...`	additional options passed to `keras::fit.keras.engine.training.Model()`

Examples

## Not run: 
# create a dataset with evenly balanced coded and uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK, !is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
    corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
    corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded

# form a tf-idf-weighted dfm
dfmat <- dfm(corp) %>%
    dfm_tfidf()

set.seed(1000)
tmod <- textmodel_mlp(dfmat, y = docvars(dfmat, "crowd_immigration_label"),
                        epochs = 5, verbose = 1)
pred <- predict(tmod, newdata = dfm_subset(dfmat, is.na(crowd_immigration_label)))
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)

## End(Not run)

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.