frequent_ngrams: Ngram Collocations

Description Usage Arguments Value See Also Examples

View source: R/frequent_ngrams.R

Description

Find a important ngram (2-3) collocations. Wraps collocations to provide stopword, min/max characters, and stemming with a generic plot function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
frequent_ngrams(
  text.var,
  n = 20,
  gram.length = 2:3,
  stopwords = stopwords::stopwords("english"),
  min.char = 4,
  max.char = Inf,
  order.by = "frequency",
  stem = FALSE,
  language = "porter",
  ...
)

Arguments

text.var

A vector of character strings.

n

The number of rows to include.

gram.length

The length of ngram to generate (2-3).

stopwords

A vector of stopwords to exclude.

min.char

The minimum number of characters a word must be (including apostrophes) for inclusion.

max.char

The maximum number of characters a word must be (including apostrophes) for inclusion.

order.by

The name of the measure column to order by: "frequency", "lambda", "z".

stem

logical. If TRUE the wordStem is used with language = "porter" as the default. Note that stopwords will be stemmed as well.

language

The stem language to use (see wordStem).

...

Other arguments passed to collocations.

Value

Retuns a data.frame of terms and frequencies.

See Also

collocations

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## Not run: 
x <- presidential_debates_2012[["dialogue"]]

frequent_ngrams(x)
frequent_ngrams(x, n = 50)
frequent_ngrams(x, stopwords = c(stopwords::stopwords("english"), "american", "governor"))
frequent_ngrams(x, gram.length = 3)
frequent_ngrams(x, gram.length = 3, stem = TRUE)
frequent_ngrams(x, order.by = "lambda")

plot(frequent_ngrams(x))
plot(frequent_ngrams(x, n = 40))
plot(frequent_ngrams(x, order.by = "lambda"))
plot(frequent_ngrams(x, gram.length = 3))

## End(Not run)
## Not run: 
## ngram feature extraction
if (!require("pacman")) install.packages("pacman")
pacman::p_load(termco, dplyr, textshape, magrittr)

ngrams <- presidential_debates_2012 %$%
    frequent_ngrams(dialogue, n=10) %>%
    pull(collocation) %>%
    as_term_list() 


ngram_features <- presidential_debates_2012 %>%
    with(term_count(dialogue, person, ngrams)) %>%
    as_dtm() 

ngram_features

## tidied features
ngram_features %>%
    textshape::tidy_dtm()

## End(Not run)

trinker/termco documentation built on Jan. 7, 2022, 3:32 a.m.