Nothing
Added alt text to figures in vignettes and README (#233)
Update vignette for quanteda::dfm() v4 (#242)
stm()
tidiers for high FREX and lift words (#223)dfm
because of the upcoming release of Matrix (#218)scale_x/y_reordered()
now uses a function labels
as its main input (#200)to_lower
is passed to underlying tokenization function for character shingles (#208)content
, thanks to @jonathanvoelkle (#209)collapse
argument to unnest_functions()
. This argument now takes either NULL
(do not collapse text across rows for tokenizing) or a character vector of variables (use said variables to collapse text across rows for tokenizing). This fixes a long-standing bug and provides more consistent behavior, but does change results for many situations (such as n-gram tokenization).reorder_within()
now handles multiple variables, thanks to @tmastny (#170)to_lower
argument to other tokenizing functions, for more consistent behavior (#175)glance()
method for stm's estimated regressions, thanks to @vincentarelbundock (#176)augment()
function for stm topic model.tibble()
where appropriate, thanks to @luisdza (#136).unnest_tokens()
.unnest_tokens
can now unnest a data frame with a list column (which formerly threw the error unnest_tokens expects all columns of input to be atomic vectors (not lists)
). The unnested result repeats the objects within each list. (It's still not possible when collapse = TRUE
, in which tokens can span multiple lines).get_tidy_stopwords()
to obtain stopword lexicons in multiple languages in a tidy format.nma_words
of negators, modals, and adverbs that affect sentiment analysis (#55).NA
values are handled in unnest_tokens
so they no longer cause other columns to become NA
(#82).data.table
) consistently (#88).unnest_tokens
, bind_tf_idf
, all sparse casters) (#67, #74).stm
package (#51).get_sentiments
now works regardless of whether tidytext
has been loaded or not (#50).unnest_tokens
now supports data.table objects (#37).to_lower
parameter in unnest_tokens
to work properly for all tokenizing options.tidy.corpus
, glance.corpus
, tests, and vignette for changes to quanteda API pair_count
function, which is now in the in-development widyr packagemallet
packageunnest_tokens
preserves custom attributes of data frames and data.tablescast_sparse
, cast_dtm
, and other sparse casters to ignore groups in the input (#19)unnest_tokens
so that it no longer uses tidyr's unnest, but rather a custom version that removes some overhead. In some experiments, this sped up unnest_tokens on large inputs by about 40%. This also moves tidyr from Imports to Suggests for now.unnest_tokens
now checks that there are no list columns in the input, and raises an error if present (since those cannot be unnested).format
argument to unnest_tokens so that it can process html, xml, latex or man pages using the hunspell package, though only when token = "words"
.get_sentiments
function that takes the name of a lexicon ("nrc", "bing", or "sentiment") and returns just that sentiment data frame (#25)cast_sparse
to work with dplyr 0.5.0pair_count
function, which has been moved to pairwise_count
in the widyr package. This will be removed entirely in a future version.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.