Description Usage Arguments Value Examples
Count the occurrence of tokens within a vector of strings. This function
differs from term_count
in that term_count
is
regex based, allowing for fuzzy matching. This function only searches for
lower cased tokens (words, number sequences, or punctuation). This counting
function is faster but less flexible.
1 2 3 4 5 6 7 8 9 10 11 12 |
text.var |
The text string variable. |
grouping.var |
The grouping variable(s). Default |
token.list |
A list of named character vectors of tokens. Search will
combine the counts for tokens supplied that are in the same vector. Tokens
are defined as |
stem |
logical. If |
keep.punctuation |
logical. If |
pretty |
logical. If |
group.names |
A vector of names that corresponds to group. Generally for internal use. |
meta.sep |
A character separator (or character vector of separators) to break up the term list names (tags) into that will generate an merge table attribute on the output that has the supplied tags and meta + sub tags as dictated by the separator breaks. |
meta.names |
A vector of names corresponding to the meta tags generated
by |
... |
Other arguments passed to |
Returns a tibble object of term counts by
grouping variable. Has all of the same features as a term_count
object, meaning functions that work on a term_count
object will
operate on a a token_count
object as well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | token_list <- list(
person = c('sam', 'i'),
place = c('here', 'house'),
thing = c('boat', 'fox', 'rain', 'mouse', 'box', 'eggs', 'ham'),
no_like = c('not like')
)
token_count(sam_i_am, grouping.var = TRUE, token.list = token_list)
token_count(sam_i_am, grouping.var = NULL, token.list = token_list)
## Not run:
x <- presidential_debates_2012[["dialogue"]]
bigrams <- frequent_ngrams(x, gram.length = 2)$collocation
bigram_model <- token_count(x, TRUE, token.list = as_term_list(bigrams))
as_dtm(bigram_model)
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, lexicon, textshape)
token_list <- lexicon::nrc_emotions %>%
textshape::column_to_rownames() %>%
t() %>%
textshape::as_list()
presidential_debates_2012 %>%
with(token_count(dialogue, TRUE, token_list))
presidential_debates_2012 %>%
with(token_count(dialogue, list(person, time), token_list))
presidential_debates_2012 %>%
with(token_count(dialogue, list(person, time), token_list)) %>%
plot()
## End(Not run)
## hierarchical tokens
token_list <- list(
list(
person = c('sam', 'i')
),
list(
place = c('here', 'house'),
thing = c('boat', 'fox', 'rain', 'mouse', 'box', 'eggs', 'ham')
),
list(
no_like = c('not like'),
thing = c('train', 'goat')
)
)
(x <- token_count(sam_i_am, grouping.var = TRUE, token.list = token_list))
attributes(x)[['pre_collapse_coverage']]
## External Dictionaries
## Not run:
## dictionary from quanteda
require(quanteda); require(stringi); require(textreadr)
## Nadra Pencle and Irina Malaescu (2016) What's in the Words? Development and Validation of a
## Multidimensional Dictionary for CSR and Application Using Prospectuses. Journal of Emerging
## Technologies in Accounting: Fall 2016, Vol. 13, No. 2, pp. 109-127.
dict_corporate_social_responsibility <- "https://provalisresearch.com/Download/CSR.zip" %>%
textreadr::download() %>%
unzip(exdir = tempdir()) %>%
`[`(1) %>%
dictionary(file = .)
csr <- dict_corporate_social_responsibility %>%
as_term_list() %>%
lapply(function(x){
x %>%
stringi::stri_replace_all_fixed('_', ' ') %>%
stringi::stri_replace_all_regex('\\s*\\(.+?\\)', '') %>%
stringi::stri_replace_all_regex('[^ -~]', "'")
})
presidential_debates_2012 %>%
with(token_count(dialogue, list(time, person), csr)) %>%
plot()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.