candidate_phrases: Find candidate phrases from a vector of characters.

Description Usage Arguments Value Examples

Description

This is an input to the main phrase ranking function. It's included here because it may have utility as tokenizer that allows tokenization based on arbitrary tokens and puncuation. The default tokenization does not cross sentences and line breaks are treated as sentences for the purpose of tokenization.

Usage

1
2
candidate_phrases(x, split_words = smart_stop_words(),
  split_punct = basic_punct(), remove_numbers = F)

Arguments

x

a character vector

split_words

a vector of words to split your texts by. By defaults this calls a function that includes generated stop words.

split_punct

a vector of punctuation to use in splitting your words. By default calls a function with basic punctuation

Value

always returns a list with one element for each input text and phrases stored in a character vector. If the character vector is name then the names will be used throughout, otherwise this function generates sequential documents names.

Examples

1
2
3
candidate_phrases(test_text)
candidate_phrases(test_text, c("the","and"), c(","," \\."))
candidate_phrases(test_text, NULL, " ")   

lmkirvan/rakeR documentation built on May 14, 2019, 1:46 p.m.