Nothing
tokenize_tweets()
function, which is no longer supported.tokenize_ptb()
function for Penn Treebank tokenizations (@jrnold) (#12).chunk_text()
to split long documents into pieces (#30).tokenize_tweets()
preserves usernames, hashtags, and URLS (@kbenoit) (#44).stopwords()
function has been removed in favor of using the stopwords package (#46).tif
package. (#49)tokenize_skip_ngrams
has been improved to generate unigrams and bigrams, according to the skip definition (#24).tokenizers
supports (@ironholds) (#26).tokenize_skip_ngrams
now supports stopwords (#31).NA
consistently (#33).tokenize_words()
gains arguments to preserve or strip punctuation and numbers (#48).tokenize_skip_ngrams()
and tokenize_ngrams()
to return properly marked UTF8 strings on Windows (@patperry) (#58).tokenize_tweets()
now removes stopwords prior to stripping punctuation, making its behavior more consistent with tokenize_words()
(#76).tokenize_character_shingles()
tokenizer.tokenize_words()
and tokenize_word_stems()
.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.