View source: R/textmodel_cnnlstmemb.R
| textmodel_cnnlstmemb | R Documentation | 
A function that combines a convolutional neural network layer with a long short-term memory layer. It is designed to incorporate word sequences, represented as sequentially ordered word embeddings, into text classification. The model takes as an input a quanteda tokens object.
textmodel_cnnlstmemb(
  x,
  y,
  dropout1 = 0.2,
  dropout2 = 0.2,
  dropout3 = 0.2,
  dropout4 = 0.2,
  wordembeddim = 30,
  cnnlayer = TRUE,
  filter = 48,
  kernel_size = 5,
  pool_size = 4,
  units_lstm = 128,
  words = NULL,
  maxsenlen = 100,
  optimizer = "adam",
  loss = "categorical_crossentropy",
  metrics = "categorical_accuracy",
  ...
)
| x | tokens object | 
| y | vector of training labels associated with each document identified
in  | 
| dropout1 | A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the embedding layer. | 
| dropout2 | A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the CNN layer. | 
| dropout3 | A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer. | 
| dropout4 | A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer. | 
| wordembeddim | The number of word embedding dimensions to be fit | 
| cnnlayer | A logical parameter that allows user to include or exclude a convolutional layer in the neural network model | 
| filter | The number of output filters in the convolution | 
| kernel_size | An integer or list of a single integer, specifying the length of the 1D convolution window | 
| pool_size | Size of the max pooling windows.
 | 
| units_lstm | The number of nodes of the lstm layer | 
| words | The maximum number of words used to train model. Defaults to the
number of features in  | 
| maxsenlen | The maximum sentence length of training data | 
| optimizer | optimizer used to fit model to training data, see
 | 
| loss | objective loss function, see
 | 
| metrics | metric used to train algorithm, see
 | 
| ... | additional options passed to
 | 
save.textmodel_cnnlstmemb(), load.textmodel_cnnlstmemb()
## Not run: 
# create dataset with evenly balanced coded & uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK,
                           !is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
    corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
    corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded
tok <- tokens(corp)
tmod <- textmodel_cnnlstmemb(tok,
                             y = docvars(tok, "crowd_immigration_label"),
                             epochs = 5, verbose = 1)
newdata = tokens_subset(tok, subset = is.na(crowd_immigration_label))
pred <- predict(tmod, newdata = newdata)
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.