genDefaultSettings: A list of default settings for a TEXT.MINER object:

Description Usage Fields

View source: R/textminer.R

Description

A list of default settings for a TEXT.MINER object:

Usage

1
2
3
4
5
6
7
genDefaultSettings(remove_punctuation = TRUE, remove_numbers = TRUE,
  tolower = TRUE, metric = "spherical", stemming = TRUE,
  remove_special_characters = TRUE, plain_text = TRUE, unique = TRUE,
  weighting = "freq", wc_max_words = 50, wc_rot_per = 0.4,
  stop_words = c(letters, LETTERS, tm::stopwords("english")),
  wc_color = "blue", num_clust = 3, wc_gradient = "weight",
  dictionary = data.frame(), plot_color = "blue", sparsity = 0.999)

Fields

remove_punctuation

a single logical: Should punctuations be removed from all text documents? (default is TRUE)

remove_numbers

a single logical: Should numbers be removed from all text documents? (default is TRUE)

tolower

a single logical: should all letters be converted to lower case? (default is TRUE)

stemming

a single logical: should all words be reduced to their stem? (default is FALSE)

remove_special_characters

logical: should all special characters be removed? (default is TRUE)

plain_text

a single logical: should all the documents be treated as plain text? (default is TRUE)

unique

a single logical: should duplicated documents be removed? (default is TRUE)

weighting

a single character: specifies the default weighting. Must be within c('freq', 'tfidf'). (default is 'tfidf')

metric

a single character: specifies the default metric for computing distances between the documents. Must be within c("euclidean", "maximum", "manhattan", "canberra", "binary" , "minkowski", "spherical"). (default is 'spherical')

wc_max_words

a single integer: specifies the maximum number of words shown in the word cloud.

wc_rot_per

a single numeric: must be between 0 and 1. Specifies the percentage of words shown as rotated in the word cloud.

wc_color

a single character: specifies the color of the words shown in the word cloud.

wc_gradient

a single character: which weighting should be reflected by the color gradient in the word cloud. Must be within c('freq', 'tfidf')

wc_color

a single character: specifies the color of the points in the point 2d and 3d plots.

num_clust

a single integer: specifies the default number of clusters. (default is 3)

sparsity

a single numeric: must be between 0 and 1 and specifies the sparsity. For example, if sparcity is 0.98, all words appearing in less than 2% of the documents will be removed. (default is 0.99)


genpack/texer documentation built on Feb. 29, 2020, 9:21 a.m.