get_ngrams: Extract n-grams from text

Description Usage Arguments Value Examples

View source: R/litsearchr.functions.R

Description

This function extracts n-grams from text.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
get_ngrams(
  x,
  n = 2,
  min_freq = 1,
  ngram_quantile = NULL,
  stop_words,
  rm_punctuation = FALSE,
  preserve_chars = c("-", "_"),
  language = "English"
)

Arguments

x

A character vector from which to extract n-grams.

n

Numeric: the minimum number of terms in an n-gram.

min_freq

Numeric: the minimum number of times an n-gram must occur to be returned.

ngram_quantile

Numeric: what quantile of ngrams should be retained. Defaults to 0.8; i.e. the 80th percentile of ngram frequencies.

stop_words

A character vector of stopwords to ignore.

rm_punctuation

Logical: should punctuation be removed before selecting ngrams?

preserve_chars

A character vector of punctuation marks to be retained if rm_punctuation is TRUE.

language

A string indicating the language to use for removing stopwords.

Value

A character vector of n-grams.

Examples

1
get_ngrams("On the Origin of Species By Means of Natural Selection")

Example output

Loading required namespace: stopwords
                  8 
"Natural Selection" 

discoverableresearch documentation built on Oct. 23, 2020, 7:13 p.m.