txt_recode_ngram | R Documentation |
Replace in a character vector of tokens, tokens with compound multi-word expressions.
So that c("New", "York")
will be c("New York", NA)
.
txt_recode_ngram(x, compound, ngram, sep = " ")
x |
a character vector of words where you want to replace tokens with compound multi-word expressions.
This is generally a character vector as returned by the token column of |
compound |
a character vector of compound words multi-word expressions indicating terms which can be considered as one word.
For example |
ngram |
a integer vector of the same length as |
sep |
separator used when the compounds were constructed by combining the words together into a compound multi-word expression. Defaults to a space: ' '. |
the same character vector x
where elements in x
will be replaced by compound multi-word expression.
If will give preference to replacing with compounds with higher ngrams if these occur. See the examples.
txt_nextgram
x <- c("I", "went", "to", "New", "York", "City", "on", "holiday", ".") y <- txt_recode_ngram(x, compound = "New York", ngram = 2, sep = " ") data.frame(x, y) keyw <- data.frame(keyword = c("New-York", "New-York-City"), ngram = c(2, 3)) y <- txt_recode_ngram(x, compound = keyw$keyword, ngram = keyw$ngram, sep = "-") data.frame(x, y) ## Example replacing adjectives followed by a noun with the full compound word data(brussels_reviews_anno) x <- subset(brussels_reviews_anno, language == "nl") keyw <- keywords_phrases(x$xpos, term = x$token, pattern = "JJNN", is_regex = TRUE, detailed = FALSE) head(keyw) x$term <- txt_recode_ngram(x$token, compound = keyw$keyword, ngram = keyw$ngram) head(x[, c("token", "term", "xpos")], 12)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.