| textmodel_doc2vec | R Documentation |
Train a doc2vec model (Le & Mikolov, 2014) using a quanteda::tokens object.
textmodel_doc2vec(
x,
dim = 50,
type = c("dm", "dbow"),
min_count = 5,
window = 5,
iter = 10,
alpha = 0.05,
model = NULL,
use_ns = TRUE,
ns_size = 5,
sample = 0.001,
tolower = TRUE,
include_data = FALSE,
verbose = FALSE,
...
)
x |
a quanteda::tokens or quanteda::tokens_xptr object. |
dim |
the size of the word vectors. |
type |
the architecture of the model; either "dm" (distributed memory) or "dbow" (distributed bag-of-words). |
min_count |
the minimum frequency of the words. Words less frequent than
this in |
window |
the size of the window for context words. Ignored when |
iter |
the number of iterations in model training. |
alpha |
the initial learning rate. |
model |
a trained Word2vec model; if provided, its word vectors are updated for |
use_ns |
if |
ns_size |
the size of negative samples. Only used when |
sample |
the rate of sampling of words based on their frequency. Sampling is
disabled when |
tolower |
lower-case all the tokens before fitting the model. |
include_data |
if |
verbose |
if |
... |
additional arguments. |
Returns a textmodel_doc2vec object with matrices for word and document vector values in values.
Other elements are the same as textmodel_word2vec.
Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents (No. arXiv:1405.4053). arXiv. https://doi.org/10.48550/arXiv.1405.4053
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.