crazyfy: Data preparation before detection of strangers

Description Usage Arguments Details Value Examples

View source: R/crazyfy.R

Description

crazyfy preprocess data for anomalies detection computational routines with strange : missing values treatement, variables standardisation, eventual recoding in log, treatment of character/factor variables.

Usage

1
2
3
crazyfy(data, do = c("factor", "log", "impute", "range"), id = NULL,
  skewness.cutpoint = 2, NA.method = "mean", NA.value = 0,
  verbose = FALSE)

Arguments

data

Source data (data.frame or data.table).

do

character vector - List of processing steps to apply – see details.

id

(optional) character - name of a preexisting variable to be used as ID.

skewness.cutpoint

numeric - value that is used to determine whether log recoding should be applied.

NA.method

character - method to be used for missing values imputation; one of "mean" or "value" (then using following parameter NA.value).

NA.value

numeric Value to be used to impute missing values when NA.method if "value".

verbose

logical - should function display some details about processing.

Details

See here this list of possible pre-treatment operations. * factor: Factors/characters are transformed into numeric by using term frequency–inverse document frequency approach (td-idf). Note that we use the smooth weighting IDF weight, ie. we take the log of 1+N/nt where N is the number of observations and nt the frequency for the specific term t. * log: compute log(x-min(x)). Done for all numeric variables having a distribution with skewness greater than skewness.cutpoint * impute: impute missing values. Possible method, chosen with NA.method are using variable average or a specific value then provided by NA.value. * range: standardize variable: (x-min(x))/max(x).

Value

Pre-processed data of classes data.table overloaded by crazy.data.table.

Examples

1
2
3

stranger documentation built on March 18, 2018, 2:01 p.m.