frequent_terms: N Most Frequent Terms

Description Usage Arguments Value Examples

View source: R/frequent_terms.R

Description

frequent_terms - Find a list of the n most frequent terms.

all_words - Find a list of all terms used.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
frequent_terms(
  text.var,
  n = 20,
  stopwords = stopwords::stopwords("english"),
  min.freq = NULL,
  min.char = 4,
  max.char = Inf,
  stem = FALSE,
  language = "porter",
  strip = TRUE,
  strip.regex = "[^a-z' ]",
  alphabetical = FALSE,
  ...
)

all_words(text.var, stopwords = NULL, min.char = 0, ...)

Arguments

text.var

A vector of character strings.

n

The number of rows to print. If integer selects the frequency at the nth row and prints all rows >= that value. If proportional (less than 0) the frequency value for the nth% row is selected and prints all rows >= that value.

stopwords

A vector of stopwords to exclude.

min.freq

The minimum frequency to print. Note that this argument overides the n argument.

min.char

The minimum number of characters a word must be (including apostrophes) for inclusion.

max.char

The maximum number of characters a word must be (including apostrophes) for inclusion.

stem

logical. If TRUE the wordStem is used with language = "porter" as the default. Note that stopwords will be stemmed as well.

language

The stem language to use (see wordStem).

strip

logical. If TRUE all values that are not alpha, apostrophe, or spaces are stripped. This regex can be changed via the strip.regex argument.

strip.regex

A regular expression used for stripping undesired characters.

alphabetical

logical. Should rows be arranged alphabetically by term or frequency.

...

ignored.

Value

Returns a data.frame of terms and frequencies.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
x <- presidential_debates_2012[["dialogue"]]

frequent_terms(x)
frequent_terms(x, min.char = 1)
frequent_terms(x, n = 50)
frequent_terms(x, n = .02)
frequent_terms(x, stem = TRUE)
frequent_terms(x, n = 50, stopwords = c(stopwords::stopwords("english"), "said", "well"))

plot(frequent_terms(x))
plot(frequent_terms(x, n = .02))
plot(frequent_terms(x, n = 40))
plot(frequent_terms(x, n = 40), as.cloud = TRUE)

## Note `n` can be used in print to change how many rows are returned.
## This output can be reassigned when wrapped in print.  This is useful
## reduce computational time on larger data sets.
y <- frequent_terms(x, n=10)
nrow(y)
z <- print(frequent_terms(x, n=100))
nrow(z)

## Cumulative Percent Plot
plot_cum_percent(frequent_terms(presidential_debates_2012[["dialogue"]]))

## End(Not run)

trinker/termco documentation built on Jan. 7, 2022, 3:32 a.m.