stdize: Standardize data
In MuMIn: Multi-Model Inference

stdize

R Documentation

Standardize data

Description

stdize standardizes variables by centring and scaling.

stdizeFit modifies a model call or existing model to use standardized variables.

Usage

## Default S3 method:
stdize(x, center = TRUE, scale = TRUE, ...)

## S3 method for class 'logical'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
  center = TRUE, scale = FALSE, ...)
## also for two-level factors

## S3 method for class 'data.frame'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
  center = TRUE, scale = TRUE, omit.cols = NULL, source = NULL,
  prefix = TRUE, append = FALSE, ...)

## S3 method for class 'formula'
stdize(x, data = NULL, response = FALSE,
  binary = c("center", "scale", "binary", "half", "omit"),
  center = TRUE, scale = TRUE, omit.cols = NULL, prefix = TRUE,
  append = FALSE, ...)

stdizeFit(object, newdata, which = c("formula", "subset", "offset", "weights",
	"fixed", "random", "model"), evaluate = TRUE, quote = NA)

Arguments

`x`	a numeric or logical vector, factor, numeric matrix, `data.frame` or a formula.
`center`, `scale`	either a logical value or a logical or numeric vector of length equal to the number of columns of `x` (see ‘Details’). `scale` can be also a function to use for scaling.
`binary`	specifies how binary variables (logical or two-level factors) are scaled. Default is to `"center"` by subtracting the mean assuming levels are equal to 0 and 1; use `"scale"` to both centre and scale by SD, `"binary"` to centre to 0 / 1, `"half"` to centre to -0.5 / 0.5, and `"omit"` to leave binary variables unmodified. This argument has precedence over `center` and `scale`, unless it is set to `NA` (in which case binary variables are treated like numeric variables).
`source`	a reference `data.frame`, being a result of previous `stdize`, from which `scale` and `center` values are taken. Column names are matched. This can be used for scaling new data using statistics of another data.
`omit.cols`	column names or numeric indices of columns that should be left unaltered.
`prefix`	either a logical value specifying whether the names of transformed columns should be prefixed, or a two-element character vector giving the prefixes. The prefixes default to “z.” for scaled and “c.” for centred variables.
`append`	logical, if `TRUE`, modified columns are appended to the original data frame.
`response`	logical, stating whether the response should be standardized. By default, only variables on the right-hand side of the formula are standardized.
`data`	an object coercible to `data.frame`, containing the variables in `formula`. Passed to, and used by \lcodemodel.frame.
`newdata`	a `data.frame` returned by `stdize`, to be used by the modified model.
`...`	for the `formula` method, additional arguments passed to \lcodemodel.frame. For other methods, it is silently ignored.
`object`	a fitted model object or an expression being a `call` to the modelling function.
`which`	a character string naming arguments which should be modified. This should be all arguments which are evaluated in the `data` environment. Can be also `TRUE` to modify the expression as a whole. The `data` argument is additionally replaced with that passed to `stdizeFit`.
`evaluate`	if `TRUE`, the modified call is evaluated and the fitted model object is returned.
`quote`	if `TRUE`, avoids evaluating `object`. Equivalent to `stdizeFit(quote(expr), ...)`. Defaults to `NA` in which case `object` being a call to non-primitive function is quoted.

Details

stdize resembles \lcodescale, but uses special rules for factors, similarly to standardize in package arm.

stdize differs from standardize in that it is used on data rather than on the fitted model object. The scaled data should afterwards be passed to the modelling function, instead of the original data.

Unlike standardize, it applies special ‘binary’ scaling only to two-level factors and logical variables, rather than to any variable with two unique values.

Variables of only one unique value are unchanged.

By default, stdize scales by dividing by standard deviation rather than twice the SD as standardize does. Scaling by SD is used also on uncentred values, which is different from \lcodescale where root-mean-square is used.

If center or scale are logical scalars or vectors of length equal to the number of columns of x, the centring is done by subtracting the mean (if center corresponding to the column is TRUE), and scaling is done by dividing the (centred) value by standard deviation (if corresponding scale is TRUE). If center or scale are numeric vectors with length equal to the number of columns of x (or numeric scalars for vector methods), then these are used instead. Any NAs in the numeric vector result in no centring or scaling on the corresponding column.

Note that scale = 0 is equivalent to no scaling (i.e. scale = 1).

Binary variables, logical or factors with two levels, are converted to numeric variables and transformed according to the argument binary, unless center or scale are explicitly given.

Value

stdize returns a vector or object of the same dimensions as x, where the values are centred and/or scaled. Transformation is carried out column-wise in data.frames and matrices.

The returned value is compatible with that of \lcodescale in that the numeric centring and scalings used are stored in attributes "scaled:center" and "scaled:scale" (these can be NA if no centring or scaling has been done).

stdizeFit returns a modified, fitted model object that uses transformed variables from newdata, or, if evaluate is FALSE, an unevaluated call where the variable names are replaced to point the transformed variables.

Author(s)

Kamil Bartoń

References

Gelman, A. 2008 Scaling regression inputs by dividing by two standard deviations. Statistics in medicine 27, 2865–2873.

Examples

# compare "stdize" and "scale"
nmat <- matrix(runif(15, 0, 10), ncol = 3)

stdize(nmat)
scale(nmat)

rootmeansq <- function(v) {
    v <- v[!is.na(v)]
    sqrt(sum(v^2) / max(1, length(v) - 1L))
}

scale(nmat, center = FALSE)
stdize(nmat, center = FALSE, scale = rootmeansq)

if(require(lme4)) {
# define scale function as twice the SD to reproduce "arm::standardize"
twosd <- function(v) 2 * sd(v, na.rm = TRUE)

# standardize data (scaled variables are prefixed with "z.")
z.CO2 <- stdize(uptake ~ conc + Plant, data = CO2, omit = "Plant", scale = twosd)
summary(z.CO2)


fmz <- stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant)), newdata = z.CO2)
# produces:
# lmer(uptake ~ z.conc + I(z.conc^2) + (1 | Plant), data = z.CO2)


## standardize using scale and center from "z.CO2", keeping the original data:
z.CO2a <- stdize(CO2, source = z.CO2, append = TRUE)
# Here, the "subset" expression uses untransformed variable, so we modify only
# "formula" argument, keeping "subset" as-is. For that reason we needed the
# untransformed variables in "newdata".
stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant),
    subset = conc > 100,
    ), newdata = z.CO2a, which = "formula", evaluate = FALSE)


# create new data as a sequence along "conc"
newdata <-  data.frame(conc = seq(min(CO2$conc), max(CO2$conc), length = 10))

# scale new data using scale and center of the original scaled data: 
z.newdata <- stdize(newdata, source = z.CO2)


# plot predictions against "conc" on real scale:
plot(newdata$conc, predict(fmz, z.newdata, re.form = NA))


# compare with "arm::standardize"
## Not run: 
library(arm)
fms <- standardize(lmer(uptake ~ conc + I(conc^2) + (1 | Plant), data = CO2))
plot(newdata$conc, predict(fms, z.newdata, re.form = NA))

## End(Not run)
}

MuMIn documentation built on April 3, 2025, 6:07 p.m.