Matrix==1.6-2
releaseMatrix>=1.5-2
, fixes #338rsparse
package for SVD and GloVe factorizationspostag_lemma_tokenizer()
(wrapper around udpipe::udpipe_annotate
). Can be used as a drop-in replacement for more simple tokenizers in text2vec. combine_vocabularies()
part of public API - see #260 for details.coherence()
function for comprehensive coherence metrics. Thanks to Manuel Bickel ( @manuelbickel ) for conrtibution.fit_transform
and transform
methods in LDA model produce same results. Thanks to @jiunsiew for reporting. Also now LDA has n_iter_inference
parameter. It controls number of the samples from converged distribution for document-topic inference. This leads to more robust document-topic probabilities (reduced variance). Default value is 10.iter
to collocation_stat
. iter
shows iteration number when collocation stats (and counters) were calculated.collocation_stat
- were never used internally. Users can easily calculate ranks themselvesmagrittr
, uuid
, tokenizers
text2vec
side - we just put abstract scikit-learn
-like classes to a separate package in order to make them more reusable.prune_vocabulary
- filter by document countsirlba
.dist2
performamce for RWMD - incorporate ideas from gensim PR discussion.data.frame
with meta-information in attributes (stopwords, ngram, number of docs, etc).lda_c
from formats in DTM constructionifiles_parallel
, itoken_parallel
high-level functions for parallel computingchunks_numer
parameter renamed to n_chunks
create_corpus
from public API, moved co-occurence related optons to create_tcm
from vecorizerscreate_dtm
, create_tcm
. Now package relies on sparsepp library for underlying hash maps.as.lda_c()
function2016-10-03. See 0.4 milestone tags.
R6
packagedoc_proportions
. see #52.stop_words
argument to prune_vocabulary
. signature also was changed.attr(corpus, 'ids')
lda_c
formatitoken
. itoken
. transform_*
- more intuitive + simpler usage with autocompletionvocabulary
to create_vocabulary
.create_dtm
, create_tcm
.ids
argument to itoken
. Simplifies assignement of ids to rows of DTMcreate_vocabulary
now can handle stopwords
split_into()
util.First CRAN release of text2vec.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.