fortify_stopwords: Improve an existing stopword list by finding stopwords using...

Description Usage Arguments Value

Description

This function takes an existing stopword list and then finds words that are used frequently but appear infrequently in keyphrases. it then adds these non-keywords to you stopword list. If the words are more frequently adjacent to keywords than they are in keywords, they are selected.

Usage

1
2
fortify_stopwords(x, stopwords = smart_stop_words(), n = 0.97,
  sample_frac = 1)

Arguments

x

this is the vector of texts that you want to use to generate additionaly stopwords

stopwords

this is the list of stopwords you want to enrich

n

is the percentage of the total number of words that you want to consider when looking for common words. It ranges from 0 to 1 but should always be set to a relatively high number to ensure that only commonly used words are added to the stop list

sample_frac

this is the percentage of documents in x you want to consider. Provided for big datasets.

Value

Returns a vector of fortified stopwords


lmkirvan/rakeR documentation built on May 14, 2019, 1:46 p.m.