textmodel_cnnlstmemb: [Experimental] Convolutional NN + LSTM model fitted to word...
In quanteda/quanteda.classifiers: Models for supervised text classification

textmodel_cnnlstmemb

R Documentation

[Experimental] Convolutional NN + LSTM model fitted to word embeddings

Description

A function that combines a convolutional neural network layer with a long short-term memory layer. It is designed to incorporate word sequences, represented as sequentially ordered word embeddings, into text classification. The model takes as an input a quanteda tokens object.

Usage

textmodel_cnnlstmemb(
  x,
  y,
  dropout1 = 0.2,
  dropout2 = 0.2,
  dropout3 = 0.2,
  dropout4 = 0.2,
  wordembeddim = 30,
  cnnlayer = TRUE,
  filter = 48,
  kernel_size = 5,
  pool_size = 4,
  units_lstm = 128,
  words = NULL,
  maxsenlen = 100,
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "categorical_accuracy",
  ...
)

Arguments

`x`	tokens object
`y`	vector of training labels associated with each document identified in `train`. (These will be converted to factors if not already factors.)
`dropout1`	A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the embedding layer.
`dropout2`	A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the CNN layer.
`dropout3`	A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer.
`dropout4`	A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer.
`wordembeddim`	The number of word embedding dimensions to be fit
`cnnlayer`	A logical parameter that allows user to include or exclude a convolutional layer in the neural network model
`filter`	The number of output filters in the convolution
`kernel_size`	An integer or list of a single integer, specifying the length of the 1D convolution window
`pool_size`	Size of the max pooling windows. `keras::layer_max_pooling_1d()`
`units_lstm`	The number of nodes of the lstm layer
`words`	The maximum number of words used to train model. Defaults to the number of features in `x`
`maxsenlen`	The maximum sentence length of training data
`optimizer`	optimizer used to fit model to training data, see `keras::compile.keras.engine.training.Model()`
`loss`	objective loss function, see `keras::compile.keras.engine.training.Model()`
`metrics`	metric used to train algorithm, see `keras::compile.keras.engine.training.Model()`
`...`	additional options passed to `keras::fit.keras.engine.training.Model()`

Examples

## Not run: 
# create dataset with evenly balanced coded & uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK,
                           !is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
    corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
    corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded

tok <- tokens(corp)

tmod <- textmodel_cnnlstmemb(tok,
                             y = docvars(tok, "crowd_immigration_label"),
                             epochs = 5, verbose = 1)

newdata = tokens_subset(tok, subset = is.na(crowd_immigration_label))
pred <- predict(tmod, newdata = newdata)
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)


## End(Not run)

quanteda/quanteda.classifiers documentation built on Oct. 20, 2023, 6:53 a.m.