Nothing
Matrix==1.6-2
releaseMatrix>=1.5-2
, fixes #338rsparse
package for SVD and GloVe factorizationspostag_lemma_tokenizer()
(wrapper around udpipe::udpipe_annotate
). Can be used as a drop-in replacement for more simple tokenizers in text2vec. combine_vocabularies()
part of public API - see #260 for details.coherence()
function for comprehensive coherence metrics. Thanks to Manuel Bickel ( @manuelbickel ) for conrtibution.fit_transform
and transform
methods in LDA model produce same results. Thanks to @jiunsiew for reporting. Also now LDA has n_iter_inference
parameter. It controls number of the samples from converged distribution for document-topic inference. This leads to more robust document-topic probabilities (reduced variance). Default value is 10.iter
to collocation_stat
. iter
shows iteration number when collocation stats (and counters) were calculated.collocation_stat
- were never used internally. Users can easily calculate ranks themselvesmagrittr
, uuid
, tokenizers
text2vec
side - we just put abstract scikit-learn
-like classes to a separate package in order to make them more reusable.prune_vocabulary
- filter by document countsirlba
.dist2
performamce for RWMD - incorporate ideas from gensim PR discussion.data.frame
with meta-information in attributes (stopwords, ngram, number of docs, etc).lda_c
from formats in DTM constructionifiles_parallel
, itoken_parallel
high-level functions for parallel computingchunks_numer
parameter renamed to n_chunks
create_corpus
from public API, moved co-occurence related optons to create_tcm
from vecorizerscreate_dtm
, create_tcm
. Now package relies on sparsepp library for underlying hash maps.as.lda_c()
function2016-10-03. See 0.4 milestone tags.
R6
packagedoc_proportions
. see #52.stop_words
argument to prune_vocabulary
. signature also was changed.attr(corpus, 'ids')
lda_c
formatitoken
. itoken
. transform_*
- more intuitive + simpler usage with autocompletionvocabulary
to create_vocabulary
.create_dtm
, create_tcm
.ids
argument to itoken
. Simplifies assignement of ids to rows of DTMcreate_vocabulary
now can handle stopwords
split_into()
util.First CRAN release of text2vec.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.