Customization within bestNormalize
In bestNormalize: Normalizing Transformation Functions

knitr::opts_chunk$set(echo = TRUE, fig.height = 5, fig.width = 7)
library(bestNormalize)

Custom functions with bestNormalize

This vignette will go over the steps required to implement a custom user-defined function within the bestNormalize framework.

There are 3 steps.

1) Create transformation function

2) Create predict method for transformation function (that can be applied to new data)

3) Pass through new function and predict method to bestNormalize

Example: cube-root

S3 methods

Here, we start by defining a new function that we'll call cuberoot_x, which will take an argument a (as does the sqrt_x function) which will try to add a constant if it sees any negative numbers in x. It will also take the argument standardize which will center and scale the transformed data so that it's centered at 0 with SD = 1.

## Define user-function
cuberoot_x <- function(x, a = NULL, standardize = TRUE, ...) {
  stopifnot(is.numeric(x))

  min_a <- max(0, -(min(x, na.rm = TRUE)))
  if(!length(a)) 
    a <- min_a
  if(a < min_a) {
    warning("Setting a <  max(0, -(min(x))) can lead to transformation issues",
            "Standardize set to FALSE")
    standardize <- FALSE
  }


  x.t <- (x + a)^(1/3)
  mu <- mean(x.t, na.rm = TRUE)
  sigma <- sd(x.t, na.rm = TRUE)
  if (standardize) x.t <- (x.t - mu) / sigma

  # Get in-sample normality statistic results
  ptest <- nortest::pearson.test(x.t)

  val <- list(
    x.t = x.t,
    x = x,
    mean = mu,
    sd = sigma,
    a = a,
    n = length(x.t) - sum(is.na(x)),
    norm_stat = unname(ptest$statistic / ptest$df),
    standardize = standardize
  )

  # Assign class
  class(val) <- c('cuberoot_x', class(val))
  val
}

Note that we assigned a class to the object this returns of the same name; this is necessary for successful implementation within bestNormalize. We'll also need an associated predict method that is used to apply the transformation to newly observed data. `

predict.cuberoot_x <- function(object, newdata = NULL, inverse = FALSE, ...) {

  # If no data supplied and not inverse
  if (is.null(newdata) & !inverse)
    newdata <- object$x

  # If no data supplied and inverse
  if (is.null(newdata) & inverse)
    newdata <- object$x.t

  # Actually performing transformations

  # Perform inverse transformation as estimated
  if (inverse) {

    # Reverse-standardize
    if (object$standardize) 
      newdata <- newdata * object$sd + object$mean

    # Reverse-cube-root (cube)
    newdata <-  newdata^3 - object$a


    # Otherwise, perform transformation as estimated
  } else if (!inverse) {
    # Take cube root
    newdata <- (newdata + object$a)^(1/3)

    # Standardize to mean 0, sd 1
    if (object$standardize) 
      newdata <- (newdata - object$mean) / object$sd
  }

  # Return transformed data
  unname(newdata)
}

Optional: print method

This will be printed when bestNormalize selects your custom method or when you print an object returned by your new custom function.

print.cuberoot_x <- function(x, ...) {
  cat(ifelse(x$standardize, "Standardized", "Non-Standardized"),
      'cuberoot(x + a) Transformation with', x$n, 'nonmissing obs.:\n', 
      'Relevant statistics:\n',
      '- a =', x$a, '\n',
      '- mean (before standardization) =', x$mean, '\n',
      '- sd (before standardization) =', x$sd, '\n')
}

Note: if you can find a similar transformation in the source code, it's easy to model your code after it. For instance, for cuberoot_x and predict.cuberoot_x, I used sqrt_x.R as a template file.

Implementing with bestNormalize

# Store custom functions into list
custom_transform <- list(
  cuberoot_x = cuberoot_x,
  predict.cuberoot_x = predict.cuberoot_x,
  print.cuberoot_x = print.cuberoot_x
)

set.seed(123129)
x <- rgamma(100, 1, 1)
(b <- bestNormalize(x = x, new_transforms = custom_transform, standardize = FALSE))

Evidently, the cube-rooting was the best normalizing transformation!

Sanity check

Is this code actually performing the cube-rooting?

all.equal(x^(1/3), b$chosen_transform$x.t)
all.equal(x^(1/3), predict(b))

It does indeed.

Using custom normalization statistics

The bestNormalize package can estimate any univariate statistic using its CV framework. A user-defined function can be passed in through the norm_stat_fn argument, and this function will then be applied in lieu of the Pearson test statistic divided by its degree of freedom.

The user-defined function must take an argument x, which indicates the data on which a user wants to evaluate the statistic.

Here is an example using Lilliefors (Kolmogorov-Smirnov) normality test statistic:

bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)$stat)

Here is an example using Lilliefors (Kolmogorov-Smirnov) normality test's p-value:

(dont_do_this <- bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)$p))

Note: bestNormalize will attempt to minimize this statistic by default, which is definitely not what you want to do when calculating the p-value. This is seen in the example above, as the WORST normalization transformation is chosen.

In this case, a user is advised to either manually select the best one:

best_transform <- names(which.max(dont_do_this$norm_stats))
(do_this <- dont_do_this$other_transforms[[best_transform]])

Or, the user can reverse their defined statistic (in this case by subtracting it from 1):

(do_this <- bestNormalize(x, norm_stat_fn = function(x) 1-nortest::lillie.test(x)$p))

Any scripts or data that you put into this service are public.

bestNormalize documentation built on Aug. 18, 2023, 9:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bestNormalize
Normalizing Transformation Functions

Customization within bestNormalize
In bestNormalize: Normalizing Transformation Functions

Custom functions with bestNormalize

Example: cube-root

S3 methods

Optional: print method

Implementing with bestNormalize

Sanity check

Using custom normalization statistics

Try the bestNormalize package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bestNormalize Normalizing Transformation Functions

Customization within bestNormalize In bestNormalize: Normalizing Transformation Functions

Custom functions with bestNormalize

Example: cube-root

S3 methods

Optional: print method

Implementing with bestNormalize

Sanity check

Using custom normalization statistics

Try the bestNormalize package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bestNormalize
Normalizing Transformation Functions

Customization within bestNormalize
In bestNormalize: Normalizing Transformation Functions