View source: R/textmodel_cnnlstmemb.R
textmodel_cnnlstmemb | R Documentation |
A function that combines a convolutional neural network layer with a long short-term memory layer. It is designed to incorporate word sequences, represented as sequentially ordered word embeddings, into text classification. The model takes as an input a quanteda tokens object.
textmodel_cnnlstmemb(
x,
y,
dropout1 = 0.2,
dropout2 = 0.2,
dropout3 = 0.2,
dropout4 = 0.2,
wordembeddim = 30,
cnnlayer = TRUE,
filter = 48,
kernel_size = 5,
pool_size = 4,
units_lstm = 128,
words = NULL,
maxsenlen = 100,
optimizer = "adam",
loss = "categorical_crossentropy",
metrics = "categorical_accuracy",
...
)
x |
tokens object |
y |
vector of training labels associated with each document identified
in |
dropout1 |
A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the embedding layer. |
dropout2 |
A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the CNN layer. |
dropout3 |
A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer. |
dropout4 |
A floating variable bound between 0 and 1. It determines the rate at which units are dropped for the linear transformation of the inputs for the recurrent layer. |
wordembeddim |
The number of word embedding dimensions to be fit |
cnnlayer |
A logical parameter that allows user to include or exclude a convolutional layer in the neural network model |
filter |
The number of output filters in the convolution |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window |
pool_size |
Size of the max pooling windows.
|
units_lstm |
The number of nodes of the lstm layer |
words |
The maximum number of words used to train model. Defaults to the
number of features in |
maxsenlen |
The maximum sentence length of training data |
optimizer |
optimizer used to fit model to training data, see
|
loss |
objective loss function, see
|
metrics |
metric used to train algorithm, see
|
... |
additional options passed to
|
save.textmodel_cnnlstmemb()
, load.textmodel_cnnlstmemb()
## Not run:
# create dataset with evenly balanced coded & uncoded immigration sentences
corpcoded <- corpus_subset(data_corpus_manifestosentsUK,
!is.na(crowd_immigration_label))
corpuncoded <- data_corpus_manifestosentsUK %>%
corpus_subset(is.na(crowd_immigration_label) & year > 1980) %>%
corpus_sample(size = ndoc(corpcoded))
corp <- corpcoded + corpuncoded
tok <- tokens(corp)
tmod <- textmodel_cnnlstmemb(tok,
y = docvars(tok, "crowd_immigration_label"),
epochs = 5, verbose = 1)
newdata = tokens_subset(tok, subset = is.na(crowd_immigration_label))
pred <- predict(tmod, newdata = newdata)
table(pred)
tail(texts(corpuncoded)[pred == "Immigration"], 10)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.