f_boxcox | R Documentation |
Performs a Box-Cox transformation on a dataset to stabilize variance and make the data more normally distributed. It also provides diagnostic plots and tests for normality. The transformation is based on code of MASS/R/boxcox.R. The function prints \lambda
to the console and returns (output) the transformed data set.
f_boxcox(
data = data,
lambda = seq(-2, 2, 1/10),
plots = FALSE,
transform.data = TRUE,
interp = (plots && (length(lambda) < 100)),
eps = 1/50,
xlab = expression(lambda),
ylab = "log-Likelihood",
alpha = 0.05,
open_generated_files = TRUE,
close_generated_files = FALSE,
output_type = "off",
output_file = NULL,
output_dir = NULL,
save_in_wdir = FALSE,
...
)
data |
A numeric vector or a data frame with a single numeric column. The data to be transformed. |
lambda |
A numeric vector of |
plots |
Logical. If |
transform.data |
Logical. If |
interp |
Logical. If |
eps |
A small positive value used to determine when to switch from the power transformation to the log transformation for numerical stability. Default is |
xlab |
Character string. Label for the x-axis in plots. Default is an expression object representing |
ylab |
Character string. Label for the y-axis in plots. Default is "log-Likelihood". |
alpha |
Numeric. Significance level for the Shapiro-Wilk test of normality. Default is |
open_generated_files |
Logical. If |
close_generated_files |
Logical. If |
output_type |
Character string specifying the output format: |
output_file |
A character string specifying the name of the output file (without extension). If |
output_dir |
Character string specifying the name of the directory of the output file. Default is |
save_in_wdir |
Logical. If |
... |
Additional arguments passed to plotting functions. |
The function uses the following formula for transformation:
y(\lambda) =
\begin{cases}
\frac{y^\lambda - 1}{\lambda}, & \lambda \neq 0 \\ \log(y), & \lambda = 0
\end{cases}
where (y
) is the data being transformed, and (\lambda
) the transformation parameter, which is estimated from the data using maximum likelihood. The function computes the Box-Cox transformation for a range of \lambda
values and identifies the \lambda
that maximizes the log-likelihood function. The beauty of this transformation is that, it checks suitability of many of the common transformations in one run. Examples of most common transformations and their \lambda
value is given below:
\lambda -Value | Transformation |
———————– | ———————– |
-2 | \frac{1}{x^2} |
-1 | \frac{1}{x} |
-0.5 | \frac{1}{\sqrt{x}} |
0 | log(x) |
0.5 | \sqrt{x} |
1 | x |
2 | x^2 |
———————– | ———————– |
If the estimated transformation parameter closely aligns with one of the values listed in the previous table, it is generally advisable to select the table value rather than the precise estimated value. This approach simplifies interpretation and practical application.
The function provides diagnostic plots: a plot of log-likelihood against \lambda
values and a Q-Q plot of the transformed data.It also performs a Shapiro-Wilk test for normality on the transformed data if the sample size is less than or equal to 5000.
Note: For sample sizes greater than 5000, Shapiro-Wilk test results are not provided due to limitations in its applicability.
This function requires [Pandoc](https://github.com/jgm/pandoc/releases/tag) (version 1.12.3 or higher), a universal document converter.
Windows: Install Pandoc and ensure the installation folder
(e.g., "C:/Users/your_username/AppData/Local/Pandoc") is added to your system PATH.
macOS: If using Homebrew, Pandoc is typically installed in "/usr/local/bin". Alternatively, download the .pkg installer and verify that the binary’s location is in your PATH.
Linux: Install Pandoc through your distribution’s package manager (commonly installed in "/usr/bin" or "/usr/local/bin") or manually, and ensure the directory containing Pandoc is in your PATH.
If Pandoc is not found, this function may not work as intended.
An object of class 'f_boxcox' containing, among others, results from the boxcox transformation, lambda, the input data, transformed data, Shapiro-Wilk test on original and transformed data. Using the option "output_type", it can also generate output in the form of: R Markdown code, 'Word', or 'pdf' files. Includes print and plot methods for 'f_boxcox' objects.
Sander H. van Delden plantmind@proton.me
Salvatore Mangiafico, mangiafico@njaes.rutgers.edu
W. N. Venables and B. D. Ripley
The core of calculating \lambda
and the plotting was taken from:
file MASS/R/boxcox.R copyright (C) 1994-2004 W. N. Venables and B. D. Ripley
Some code to present the result was taken and modified from file:
rcompanion/R/transformTukey.r. (Developed by Salvatore Mangiafico)
The explanation on BoxCox transformation provided here was provided by r-coder:
# Create non-normal data in a data.frame or vector.
df <- data.frame(values = rlnorm(100, meanlog = 0, sdlog = 1))
# Store the transformation in object "bc".
bc <- f_boxcox(df$values)
# Print lambda and Shaprio.
print(bc)
# Plot the QQ plots, Histograms and Lambda Log-Likelihood estimation.
plot(bc)
# Or Directly use the transformed data from the f_boxcox object.
df$values_transformed <- f_boxcox(df$values)$transformed_data
print(df$values_transformed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.