frequent_ngrams: Ngram Collocations
In trinker/termco: Counts of Terms and Substrings

Description Usage Arguments Value See Also Examples

Find a important ngram (2-3) collocations. Wraps collocations to provide stopword, min/max characters, and stemming with a generic plot function.

frequent_ngrams(
  text.var,
  n = 20,
  gram.length = 2:3,
  stopwords = stopwords::stopwords("english"),
  min.char = 4,
  max.char = Inf,
  order.by = "frequency",
  stem = FALSE,
  language = "porter",
  ...
)

`text.var`	A vector of character strings.
`n`	The number of rows to include.
`gram.length`	The length of ngram to generate (2-3).
`stopwords`	A vector of stopwords to exclude.
`min.char`	The minimum number of characters a word must be (including apostrophes) for inclusion.
`max.char`	The maximum number of characters a word must be (including apostrophes) for inclusion.
`order.by`	The name of the measure column to order by: `"frequency"`, `"lambda"`, `"z"`.
`stem`	logical. If `TRUE` the `wordStem` is used with `language = "porter"` as the default. Note that stopwords will be stemmed as well.
`language`	The stem language to use (see `wordStem`).
`...`	Other arguments passed to `collocations`.

Retuns a data.frame of terms and frequencies.

collocations

## Not run: 
x <- presidential_debates_2012[["dialogue"]]

frequent_ngrams(x)
frequent_ngrams(x, n = 50)
frequent_ngrams(x, stopwords = c(stopwords::stopwords("english"), "american", "governor"))
frequent_ngrams(x, gram.length = 3)
frequent_ngrams(x, gram.length = 3, stem = TRUE)
frequent_ngrams(x, order.by = "lambda")

plot(frequent_ngrams(x))
plot(frequent_ngrams(x, n = 40))
plot(frequent_ngrams(x, order.by = "lambda"))
plot(frequent_ngrams(x, gram.length = 3))

## End(Not run)
## Not run: 
## ngram feature extraction
if (!require("pacman")) install.packages("pacman")
pacman::p_load(termco, dplyr, textshape, magrittr)

ngrams <- presidential_debates_2012 %$%
    frequent_ngrams(dialogue, n=10) %>%
    pull(collocation) %>%
    as_term_list() 


ngram_features <- presidential_debates_2012 %>%
    with(term_count(dialogue, person, ngrams)) %>%
    as_dtm() 

ngram_features

## tidied features
ngram_features %>%
    textshape::tidy_dtm()

## End(Not run)