standardize: Scaling and centering of vectors, matrices and arrays

standardizeR Documentation

Scaling and centering of vectors, matrices and arrays

Description

Maps a numeric variable to a new object with the same dimensions. standardize is typically used to standardize a covariate to mean 0 and SD 1. standardize2match is used to standardize one object using the mean and SD of another; it is a wrapper for standardize(x, center=mean(y), scale=sd(y)).

Usage

standardize(x, center = TRUE, scale = TRUE)
standardize2match(x, y)

Arguments

x, y

a numeric vector, matrix or multidimensional array; NA and NaN are allowed; Inf and -Inf will produce all-NaN output if either center or scale is TRUE.

center

either a logical or a numeric value of length 1.

scale

either a logical or a numeric value of length 1.

Details

standardize differs from scale by (1) accepting multidimensional arrays but not data frames; (2) not standardizing column-wise but using a single value to center or to scale; (3) if x is a vector, the output will be a vector (not a 1-column matrix). If each column in the matrix represents a different variable, use scale not standardize.

Centering is performed before scaling.

If center is numeric, that value will be subtracted from the whole object. If logical and TRUE, the mean of the object (after removing NAs) will be subtracted.

If scale is numeric, the whole object will be divided by that value. If logical and TRUE, the standard deviation of the object (after removing NAs) will be used; this may not make sense if center = FALSE.

Value

A numeric object of the same dimensions as x with the standardized values. NAs in the input will be preserved in the output.

For the default arguments, the object returned will have mean approximately zero and SD 1. (The mean is not exactly zero as scaling is performed after centering.)

Author(s)

Mike Meredith, after looking at the code of base::scale.

Examples

# Generate some fake elevation data
elev <- runif(100, min=100, max=500)
mean(elev) ; sd(elev)
str( e <- standardize(elev) )
mean(e) ; sd(e)

# Standardize so that e=0 corresponds to exactly 300m and +/- 1 to
#   a change of 100m:
e <- standardize(elev, center=300, scale=100)
mean(e)
mean(elev) - 300
range(e)
range(elev) - 300

# Generate data matrix for survey duration for 3 surveys at 10 sites
dur <- matrix(round(runif(30, 20, 60)), nrow=10, ncol=3)
d <- standardize(dur)
mean(d) ; sd(d)

# Standardize new data to match the mean and SD of 'dur'
(new <- seq(20, 60, length.out=11))
standardize2match(new, dur)

# compare with base::scale
dx <- base::scale(dur)
colMeans(dx) ; apply(dx, 2, sd)
colMeans(d) ; apply(d, 2, sd)
# Don't use 'standardize' if the columns in the matrix are different variables!

AHMbook documentation built on Sept. 12, 2024, 6:37 a.m.