optLogTransform: Optimize log transformation

Description Usage Arguments Examples

View source: R/optLogTransform.R

Description

This function finds the optimal transformation for normalization of each of the variables and outputs a matrix of the transformed data. The output is a list containing five objects:

  1. a string vector listing the names of each variable,

  2. a string vector listing the function used to transform each variable,

  3. a numeric vector giving the skew of each transformed variable,

  4. a numeric vector giving the optimal transform value for each variable,

  5. and a matrix of the transformed data.

Usage

1
2
3
optLogTransform(mydata, type = "log", skew_thresh = 1,
  n_trans_val = 50, scaled = T, retain_domain = F,
  hist_raw_folder = NA, hist_trans_folder = NA, skew_folder = NA)

Arguments

mydata

The dataset you would like to transform. Must be in vector or matrix form. If given a matrix, the function will transform each column seprately. Works best if columns are named, particularly if you are exporting plots.

type

The type of transformation can be either logarithmic or power; "log" and "power" respectively.

skew_thresh

The threshold skew value required for transformation. If the skew of the variable is less than skew_thresh, it will be considered normal and will not be transformed.

n_trans_val

The number of gridpoints representing different strengths of transformation we want to test for getting the most normal curve. The higher this number is, the better the normalization. However, higher numbers can significantly increase computation time.

scaled

If set to TRUE, the resulting transformation will have zero mean and unit variance.

retain_domain

Set to TRUE if you would like the transformed data to have the same domain as the original dataset (not recommended).

hist_raw_folder

The name of the folder where you would like to save a histogram showing the distribution of the raw data. If you do not wish to save these plots, set to NA.

hist_trans_folder

The name of the folder where you would like to save a histogram showing the distribution of the transformed data. If you do not wish to save these plots, set to NA.

skew_folder

The name of the folder where you would like to save a plot showing the optimal skew with respect to the transformation variable. If you do not wish to save these plots, set to NA.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(optLog)

# First generate a random normal dataset.
mydata <- rnorm(100, mean = 0, sd = 1)
hist(mydata)
# Add skew to the dataset.
mydata_skew <- cbind(1-(1-mydata)^2, (1-mydata)^2, mydata)
colnames(mydata_skew) <- c("Variable 1", "Variable 2", "Variable 3")
for(i in 1:3){hist(mydata_skew[,i])}

# Use optLogTransform to remove the skew.
mydata_transformed <- optLogTransform(mydata_skew, type = "power", scaled = FALSE)
for(i in 1:3){hist(mydata_transformed$data[,i])}

kforthman/optLog documentation built on Aug. 1, 2019, 8:06 p.m.