bind_ngrams: Replace blanks by replacement pattern in known ngrams in a...
In manuelbickel/textility: Utility functions for text mining

bind_ngrams

R Documentation

Replace blanks by replacement pattern in known ngrams in a string

Description

Usually ngrams are identified and modified by probabilistic collocation extraction, but in certain situations one might want to fix specific word combinations before further processing of the text independent of collocation statistics such as PMI.

Usage

bind_ngrams(string, ngrams, replacement = "_", case_insensitive = TRUE)

Arguments

`string`	A character vector in which blanks of ngrams shall be replaced.
`ngrams`	Character vector of known ngrams. Please note that ngrams in the `return` will have the case formatting of these `ngrams`.
`replacement`	A fixed pattern that shall replace blanks in ngrams. By default a dash "_".
`case_insensitive`	By default TRUE. Note that case is only used for matching (see ngram parameter)

Value

The string with modified ngrams.

Examples


bind_ngrams(c("The United Nations are an important organization.",
              "They are concerned, e.g., with sustainable development and climate change.")
            , ngrams = c("United Nations", "CLIMATE CHANGE"))
# [1] "The United_Nations are an important organization."
# [2] "They are concerned, e.g., with sustainable development and CLIMATE_CHANGE."

manuelbickel/textility documentation built on Nov. 25, 2022, 9:07 p.m.