blom: Normal scores transformation
In rcompanion: Functions to Support Extension Education Program Evaluation

View source: R/blom.r

blom	R Documentation

Normal scores transformation

Description

Normal scores transformation (Inverse normal transformation) by Elfving, Blom, van der Waerden, Tukey, and rankit methods, as well as z score transformation (standardization) and scaling to a range (normalization).

Usage

blom(
  x,
  method = "general",
  alpha = pi/8,
  complete = FALSE,
  na.last = "keep",
  na.rm = TRUE,
  adjustN = TRUE,
  min = 1,
  max = 10,
  ...
)

Arguments

`x`	A vector of numeric values.
`method`	Any one `"general"` (the default), `"blom"`, `vdw`, `"tukey"`, `"elfving"`, `"rankit"`, `zscore`, or `scale`.
`alpha`	A value used in the `"general"` method. If alpha=pi/8 (the default), the `"general"` method reduces to the `"elfving"` method. If alpha=3/8, the `"general"` method reduces to the `"blom"` method. If alpha=1/2, the `"general"` method reduces to the `"rankit"` method. If alpha=1/3, the `"general"` method reduces to the `"tukey"` method. If alpha=0, the `"general"` method reduces to the `"vdw"` method.
`complete`	If `TRUE`, `NA` values are removed before transformation. The default is `FALSE`.
`na.last`	Passed to `rank` in the normal scores methods. See the documentation for the `rank` function. The default is `"keep"`.
`na.rm`	Used in the `"zscore"` and `"scale"` methods. Passed to `mean`, `min`, and `max` functions in those methods. The default is `TRUE`.
`adjustN`	If `TRUE`, the default, the normal scores methods use only non-`NA` values to determine the sample size, `N`. This seems to work well under default conditions where `NA` values are retained, even if there are a high percentage of `NA` values.
`min`	For the `"scale"` method, the minimum value of the transformed values.
`max`	For the `"scale"` method, the maximum value of the transformed values.
`...`	additional arguments passed to `rank`.

Details

By default, NA values are retained in the output. This behavior can be changed with the na.rm argument for "zscore" and "scale" methods, or with na.last for the normal scores methods. Or NA values can be removed from the input with complete=TRUE.

For normal scores methods, if there are NA values or tied values, it is helpful to look up the documentation for rank.

In general, for normal scores methods, either of the arguments method or alpha can be used. With the current algorithms, there is no need to use both.

Normal scores transformation will return a normal distribution with a mean of 0 and a standard deviation of 1.

The "scale" method coverts values to the range specified in max and min without transforming the distribution of values. By default, the "scale" method converts values to a 1 to 10 range. Using the "scale" method with min = 0 and max = 1 is sometimes called "normalization".

The "zscore" method converts values by the usual method for z scores: (x - mean(x)) / sd(x). The transformed values with have a mean of 0 and a standard deviation of 1 but won't be coerced into a normal distribution. Sometimes this method is called "standardization".

Value

A vector of numeric values.

Note

It's possible that Gustav Elfving didn't recommend the formula used in this function for the Elfving method. I would like thank Terence Cooke at the University of Exeter for their diligence at trying to track down a reference for this formula.

Author(s)

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

References

Conover, 1995, Practical Nonparametric Statistics, 3rd.

Solomon & Sawilowsky, 2009, Impact of rank-based normalizing transformations on the accuracy of test scores.

Beasley and Erickson, 2009, Rank-based inverse normal transformations are increasingly used, but are they merited?

Examples

set.seed(12345)
A = rlnorm(100)
## Not run: hist(A)
### Convert data to normal scores by Elfving method
B = blom(A)
## Not run: hist(B)
### Convert data to z scores 
C = blom(A, method="zscore")
## Not run: hist(C)
### Convert data to a scale of 1 to 10 
D = blom(A, method="scale")
## Not run: hist(D)

### Data from Sokal and Rohlf, 1995, 
### Biometry: The Principles and Practice of Statistics
### in Biological Research
Value = c(709,679,699,657,594,677,592,538,476,508,505,539)
Sex   = c(rep("Male",3), rep("Female",3), rep("Male",3), rep("Female",3))
Fat   = c(rep("Fresh", 6), rep("Rancid", 6))
ValueBlom = blom(Value)
Sokal = data.frame(ValueBlom, Sex, Fat)
model = lm(ValueBlom ~ Sex * Fat, data=Sokal)
anova(model)
## Not run: 
hist(residuals(model))
plot(predict(model), residuals(model))

## End(Not run)

rcompanion documentation built on April 3, 2025, 11:55 p.m.