textmodel_cnnlstmemb: [Experimental] Convolutional NN + LSTM model fitted to word...

View source: R/textmodel_cnnlstmemb.R

textmodel_cnnlstmembR Documentation

[Experimental] Convolutional NN + LSTM model fitted to word embeddings

Description

A function that combines a convolutional neural network layer with a long short-term memory layer. It is designed to incorporate word sequences, represented as sequentially ordered word embeddings, into text classification. The model takes as an input a quanteda tokens object.

Usage

textmodel_cnnlstmemb(
  x,
  y,
  dropout1 = 0.2,
  dropout2 = 0.2,
  dropout3 = 0.2,
  dropout4 = 0.2,
  wordembeddim = 30,
  cnnlayer = TRUE,
  filter = 48,
  kernel_size = 5,
  pool_size = 4,
  units_lstm = 128,
  words = NULL,
  maxsenlen = 100,
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "categorical_accuracy",
  ...
)

Arguments

x

tokens object

y

vector of training labels associated with each document identified in train. (These will be converted to factors if not already factors.)

dropout1

A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the embedding layer.

dropout2

A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the CNN layer.

dropout3

A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer.

dropout4

A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer.

wordembeddim

The number of word embedding dimensions to be fit

cnnlayer

A logical parameter that allows user to include or exclude a convolutional layer in the neural network model

filter

The number of output filters in the convolution

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window

pool_size

Size of the max pooling windows. keras::layer_max_pooling_1d()

units_lstm

The number of nodes of the lstm layer

words

The maximum number of words used to train model. Defaults to the number of features in x

maxsenlen

The maximum sentence length of training data

optimizer

optimizer used to fit model to training data, see keras::compile.keras.engine.training.Model()

loss

objective loss function, see keras::compile.keras.engine.training.Model()

metrics

metric used to train algorithm, see keras::compile.keras.engine.training.Model()

...

additional options passed to keras::fit.keras.engine.training.Model()

See Also

save.textmodel_cnnlstmemb(), load.textmodel_cnnlstmemb()

Examples

## Not run: 
# create dataset with evenly balanced coded & uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK,
                           !is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
    corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
    corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded

tok <- tokens(corp)

tmod <- textmodel_cnnlstmemb(tok,
                             y = docvars(tok, "crowd_immigration_label"),
                             epochs = 5, verbose = 1)

newdata = tokens_subset(tok, subset = is.na(crowd_immigration_label))
pred <- predict(tmod, newdata = newdata)
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)


## End(Not run)

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.