Description Usage Arguments Details Value
These functions transform a text source into a dataframe of individual terms
and tokens with an occurrence date. These terms/tokens can be extracted as
ngrams of specified length. terms_by_date
is wrapper around the
function for specific types of ngrams.
1 2 3 4 5 6 7 8 | terms_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
wordStemming = TRUE, customStopwords = NULL, tokenType = "unigram")
unigrams_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
wordStemming = TRUE, customStopwords = NULL)
bigrams_by_date(textData, textColumn, dateColumn, removeNumbers = TRUE,
wordStemming = TRUE, customStopwords = NULL)
|
textData |
a dataframe containing the text to be processed |
textColumn |
a character string specifying the column name in
|
dateColumn |
a character string specifying the column name in
|
removeNumbers |
a Boolean indicating whether numbers should be removed from the result; default is TRUE. |
wordStemming |
a Boolean indicating whether words in the text should be reduced to the word stem; default is TRUE. |
customStopwords |
a character vector specifying additional stopwords that should be removed from the result |
tokenType |
the length of the consecutive token sequence extracted,
currently only |
Text input (textColumn
) is split with a word tokenizer, default
stopwords (see tidytext
) are removed and
tokens are further processed and filtered according to the function's
options. A term is the character sequence obtained after all NLP
processing options this function offers have been applied, most importantly
stemming, here the Porter stemmer from the
SnowballC package
is applied.
a dataframe with three columns listing all individual term
occurrences in the provided text source, where occur
is the
publication date associated with an original token
, which has been
processed/reduced to term
; if no stemming has been applied the term
and token in the result are identical
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.