knitr::opts_chunk$set(echo = TRUE, fig.height = 5, fig.width = 7) library(bestNormalize)
This vignette will go over the steps required to implement a custom user-defined function within the bestNormalize
framework.
There are 3 steps.
1) Create transformation function
2) Create predict method for transformation function (that can be applied to new data)
3) Pass through new function and predict method to bestNormalize
Here, we start by defining a new function that we'll call cuberoot_x
, which will take an argument a
(as does the sqrt_x
function) which will try to add a constant if it sees any negative numbers in x
. It will also take the argument standardize
which will center and scale the transformed data so that it's centered at 0 with SD = 1.
## Define user-function cuberoot_x <- function(x, a = NULL, standardize = TRUE, ...) { stopifnot(is.numeric(x)) min_a <- max(0, -(min(x, na.rm = TRUE))) if(!length(a)) a <- min_a if(a < min_a) { warning("Setting a < max(0, -(min(x))) can lead to transformation issues", "Standardize set to FALSE") standardize <- FALSE } x.t <- (x + a)^(1/3) mu <- mean(x.t, na.rm = TRUE) sigma <- sd(x.t, na.rm = TRUE) if (standardize) x.t <- (x.t - mu) / sigma # Get in-sample normality statistic results ptest <- nortest::pearson.test(x.t) val <- list( x.t = x.t, x = x, mean = mu, sd = sigma, a = a, n = length(x.t) - sum(is.na(x)), norm_stat = unname(ptest$statistic / ptest$df), standardize = standardize ) # Assign class class(val) <- c('cuberoot_x', class(val)) val }
Note that we assigned a class to the object this returns of the same name; this is necessary for successful implementation within bestNormalize
. We'll also need an associated predict
method that is used to apply the transformation to newly observed data.
`
predict.cuberoot_x <- function(object, newdata = NULL, inverse = FALSE, ...) { # If no data supplied and not inverse if (is.null(newdata) & !inverse) newdata <- object$x # If no data supplied and inverse if (is.null(newdata) & inverse) newdata <- object$x.t # Actually performing transformations # Perform inverse transformation as estimated if (inverse) { # Reverse-standardize if (object$standardize) newdata <- newdata * object$sd + object$mean # Reverse-cube-root (cube) newdata <- newdata^3 - object$a # Otherwise, perform transformation as estimated } else if (!inverse) { # Take cube root newdata <- (newdata + object$a)^(1/3) # Standardize to mean 0, sd 1 if (object$standardize) newdata <- (newdata - object$mean) / object$sd } # Return transformed data unname(newdata) }
This will be printed when bestNormalize selects your custom method or when you print an object returned by your new custom function.
print.cuberoot_x <- function(x, ...) { cat(ifelse(x$standardize, "Standardized", "Non-Standardized"), 'cuberoot(x + a) Transformation with', x$n, 'nonmissing obs.:\n', 'Relevant statistics:\n', '- a =', x$a, '\n', '- mean (before standardization) =', x$mean, '\n', '- sd (before standardization) =', x$sd, '\n') }
Note: if you can find a similar transformation in the source code, it's easy to model your code after it. For instance, for cuberoot_x
and predict.cuberoot_x
, I used sqrt_x.R
as a template file.
# Store custom functions into list custom_transform <- list( cuberoot_x = cuberoot_x, predict.cuberoot_x = predict.cuberoot_x, print.cuberoot_x = print.cuberoot_x ) set.seed(123129) x <- rgamma(100, 1, 1) (b <- bestNormalize(x = x, new_transforms = custom_transform, standardize = FALSE))
Evidently, the cube-rooting was the best normalizing transformation!
Is this code actually performing the cube-rooting?
all.equal(x^(1/3), b$chosen_transform$x.t) all.equal(x^(1/3), predict(b))
It does indeed.
The bestNormalize package can estimate any univariate statistic using its CV framework. A user-defined function can be passed in through the norm_stat_fn
argument, and this function will then be applied in lieu of the Pearson test statistic divided by its degree of freedom.
The user-defined function must take an argument x
, which indicates the data on which a user wants to evaluate the statistic.
Here is an example using Lilliefors (Kolmogorov-Smirnov) normality test statistic:
bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)$stat)
Here is an example using Lilliefors (Kolmogorov-Smirnov) normality test's p-value:
(dont_do_this <- bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)$p))
Note: bestNormalize
will attempt to minimize this statistic by default, which is definitely not what you want to do when calculating the p-value. This is seen in the example above, as the WORST normalization transformation is chosen.
In this case, a user is advised to either manually select the best one:
best_transform <- names(which.max(dont_do_this$norm_stats)) (do_this <- dont_do_this$other_transforms[[best_transform]])
Or, the user can reverse their defined statistic (in this case by subtracting it from 1):
(do_this <- bestNormalize(x, norm_stat_fn = function(x) 1-nortest::lillie.test(x)$p))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.