# Customization within bestNormalize In bestNormalize: Normalizing Transformation Functions

```knitr::opts_chunk\$set(echo = TRUE, fig.height = 5, fig.width = 7)
library(bestNormalize)
```

# Custom functions with bestNormalize

This vignette will go over the steps required to implement a custom user-defined function within the `bestNormalize` framework.

There are 3 steps.

1) Create transformation function

2) Create predict method for transformation function (that can be applied to new data)

3) Pass through new function and predict method to bestNormalize

## S3 methods

Here, we start by defining a new function that we'll call `cuberoot_x`, which will take an argument `a` (as does the `sqrt_x` function) which will try to add a constant if it sees any negative numbers in `x`. It will also take the argument `standardize` which will center and scale the transformed data so that it's centered at 0 with SD = 1.

```## Define user-function
cuberoot_x <- function(x, a = NULL, standardize = TRUE, ...) {
stopifnot(is.numeric(x))

min_a <- max(0, -(min(x, na.rm = TRUE)))
if(!length(a))
a <- min_a
if(a < min_a) {
warning("Setting a <  max(0, -(min(x))) can lead to transformation issues",
"Standardize set to FALSE")
standardize <- FALSE
}

x.t <- (x + a)^(1/3)
mu <- mean(x.t, na.rm = TRUE)
sigma <- sd(x.t, na.rm = TRUE)
if (standardize) x.t <- (x.t - mu) / sigma

# Get in-sample normality statistic results
ptest <- nortest::pearson.test(x.t)

val <- list(
x.t = x.t,
x = x,
mean = mu,
sd = sigma,
a = a,
n = length(x.t) - sum(is.na(x)),
norm_stat = unname(ptest\$statistic / ptest\$df),
standardize = standardize
)

# Assign class
class(val) <- c('cuberoot_x', class(val))
val
}
```

Note that we assigned a class to the object this returns of the same name; this is necessary for successful implementation within `bestNormalize`. We'll also need an associated `predict` method that is used to apply the transformation to newly observed data. `

```predict.cuberoot_x <- function(object, newdata = NULL, inverse = FALSE, ...) {

# If no data supplied and not inverse
if (is.null(newdata) & !inverse)
newdata <- object\$x

# If no data supplied and inverse
if (is.null(newdata) & inverse)
newdata <- object\$x.t

# Actually performing transformations

# Perform inverse transformation as estimated
if (inverse) {

# Reverse-standardize
if (object\$standardize)
newdata <- newdata * object\$sd + object\$mean

# Reverse-cube-root (cube)
newdata <-  newdata^3 - object\$a

# Otherwise, perform transformation as estimated
} else if (!inverse) {
# Take cube root
newdata <- (newdata + object\$a)^(1/3)

# Standardize to mean 0, sd 1
if (object\$standardize)
newdata <- (newdata - object\$mean) / object\$sd
}

# Return transformed data
unname(newdata)
}
```

## Optional: print method

This will be printed when bestNormalize selects your custom method or when you print an object returned by your new custom function.

```print.cuberoot_x <- function(x, ...) {
cat(ifelse(x\$standardize, "Standardized", "Non-Standardized"),
'cuberoot(x + a) Transformation with', x\$n, 'nonmissing obs.:\n',
'Relevant statistics:\n',
'- a =', x\$a, '\n',
'- mean (before standardization) =', x\$mean, '\n',
'- sd (before standardization) =', x\$sd, '\n')
}
```

Note: if you can find a similar transformation in the source code, it's easy to model your code after it. For instance, for `cuberoot_x` and `predict.cuberoot_x`, I used `sqrt_x.R` as a template file.

## Implementing with bestNormalize

```# Store custom functions into list
custom_transform <- list(
cuberoot_x = cuberoot_x,
predict.cuberoot_x = predict.cuberoot_x,
print.cuberoot_x = print.cuberoot_x
)

set.seed(123129)
x <- rgamma(100, 1, 1)
(b <- bestNormalize(x = x, new_transforms = custom_transform, standardize = FALSE))
```

Evidently, the cube-rooting was the best normalizing transformation!

## Sanity check

Is this code actually performing the cube-rooting?

```all.equal(x^(1/3), b\$chosen_transform\$x.t)
all.equal(x^(1/3), predict(b))
```

It does indeed.

# Using custom normalization statistics

The bestNormalize package can estimate any univariate statistic using its CV framework. A user-defined function can be passed in through the `norm_stat_fn` argument, and this function will then be applied in lieu of the Pearson test statistic divided by its degree of freedom.

The user-defined function must take an argument `x`, which indicates the data on which a user wants to evaluate the statistic.

Here is an example using Lilliefors (Kolmogorov-Smirnov) normality test statistic:

```bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)\$stat)
```

Here is an example using Lillifors (Kolmogorov-Smirnov) normality test's p-value:

```(dont_do_this <- bestNormalize(x, norm_stat_fn = function(x) nortest::lillie.test(x)\$p))
```

Note: `bestNormalize` will attempt to minimize this statistic by default, which is definitely not what you want to do when calculating the p-value. This is seen in the example above, as the WORST normalization transformation is chosen.

In this case, a user is advised to either manually select the best one:

```best_transform <- names(which.max(dont_do_this\$norm_stats))
(do_this <- dont_do_this\$other_transforms[[best_transform]])
```

Or, the user can reverse their defined statistic (in this case by subtracting it from 1):

```(do_this <- bestNormalize(x, norm_stat_fn = function(x) 1-nortest::lillie.test(x)\$p))
```

## Try the bestNormalize package in your browser

Any scripts or data that you put into this service are public.

bestNormalize documentation built on Sept. 16, 2021, 9:08 a.m.