n_grams: n_grams

Description Usage Arguments Value

View source: R/n_grams.R

Description

Determine the most common n-grams used in a column of text responses, optionally broken down by a demographic column(s).

Usage

1
2
3
n_grams(data, column, ..., words = 2, filter_word = "",
  remove = c(""), n = 5, min = 3, stop_thresh = 0.7,
  proportion = FALSE, pretty = "no")

Arguments

data

dataframe or tibble with a row per survey response

column

name of a character column in the data frame to be tabulated

...

optional column(s) to split into groups

words

number indicating what kind of n-grams to return (bigram, trigram...), defaults to 2 (bigrams)

filter_word

optional word to filter results by (i.e. only show n-grams containing this word)

remove

optional vector of words to exclude (i.e. remove all n-grams containing at least one of these words)

n

number of n-grams to show for each group, defaults to 3

min

number indicating the minimum number of times a word needs to appear for it to be included in output, defaults to 3

stop_thresh

numeric indicating the threshold to remove stopwords (i.e. maximum proportion of stopwords to words allowed). 1 includes all n-grams regardless of stop words, 0 excludes all n-grams containing one or more stopwords. Defaults to 0.7.

proportion

logical indicating whether to include the proportion of responses containing this n-gram, defaults to FALSE

pretty

one of either 'no', 'plot' or 'return'. Defaults to 'no'. 'plot' will end the function call by applying the prettify() function to the output with plot = TRUE. 'return' will apply the prettify() function with plot = FALSE.

Value

Table of n-grams with the number of times they appear


chrisbrownlie/surveyr documentation built on Dec. 1, 2019, 12:34 a.m.