crazyfy | R Documentation |
crazyfy
preprocess data for anomalies detection computational
routines with strange
: missing values
treatement, variables standardisation, eventual recoding in log,
treatment of character/factor variables.
crazyfy( data, do = c("factor", "log", "impute", "range"), id = NULL, skewness.cutpoint = 2, NA.method = "mean", NA.value = 0, verbose = FALSE )
data |
Source data (data.frame or data.table). |
do |
character vector - List of processing steps to apply – see details. |
id |
(optional) character - name of a preexisting variable to be used as ID. |
skewness.cutpoint |
numeric - value that is used to determine whether log recoding should be applied. |
NA.method |
character - method to be used for missing values imputation;
one of "mean" or "value" (then using following parameter |
NA.value |
numeric Value to be used to impute missing values when |
verbose |
logical - should function display some details about processing. |
See here this list of possible pre-treatment operations.
* factor: Factors/characters are transformed into numeric by using term frequency–inverse document frequency approach (td-idf). Note that we use the smooth weighting IDF weight, ie. we take the log of 1+N/nt where N is the number of observations and nt the frequency for the specific term t.
* log: compute log(x-min(x)). Done for all numeric variables having a distribution with skewness greater than skewness.cutpoint
* impute: impute missing values. Possible method, chosen with NA.method
are using variable average or a specific value then provided by NA.value
.
* range: standardize variable: (x-min(x))/max(x).
Pre-processed data of classes data.table overloaded by crazy.data.table.
library(stranger) data(iris) crazy <- crazyfy(iris[,1:4])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.