bind_ngrams: Replace blanks by replacement pattern in known ngrams in a...

View source: R/bind_ngrams.R

bind_ngramsR Documentation

Replace blanks by replacement pattern in known ngrams in a string

Description

Usually ngrams are identified and modified by probabilistic collocation extraction, but in certain situations one might want to fix specific word combinations before further processing of the text independent of collocation statistics such as PMI.

Usage

bind_ngrams(string, ngrams, replacement = "_", case_insensitive = TRUE)

Arguments

string

A character vector in which blanks of ngrams shall be replaced.

ngrams

Character vector of known ngrams. Please note that ngrams in the return will have the case formatting of these ngrams.

replacement

A fixed pattern that shall replace blanks in ngrams. By default a dash "_".

case_insensitive

By default TRUE. Note that case is only used for matching (see ngram parameter)

Value

The string with modified ngrams.

Examples


bind_ngrams(c("The United Nations are an important organization.",
              "They are concerned, e.g., with sustainable development and climate change.")
            , ngrams = c("United Nations", "CLIMATE CHANGE"))
# [1] "The United_Nations are an important organization."
# [2] "They are concerned, e.g., with sustainable development and CLIMATE_CHANGE."

manuelbickel/textility documentation built on Nov. 25, 2022, 9:07 p.m.