stdize | R Documentation |
stdize
standardizes variables by centring and scaling.
stdizeFit
modifies a model call or existing model to use standardized
variables.
## Default S3 method:
stdize(x, center = TRUE, scale = TRUE, ...)
## S3 method for class 'logical'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = FALSE, ...)
## also for two-level factors
## S3 method for class 'data.frame'
stdize(x, binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, source = NULL,
prefix = TRUE, append = FALSE, ...)
## S3 method for class 'formula'
stdize(x, data = NULL, response = FALSE,
binary = c("center", "scale", "binary", "half", "omit"),
center = TRUE, scale = TRUE, omit.cols = NULL, prefix = TRUE,
append = FALSE, ...)
stdizeFit(object, newdata, which = c("formula", "subset", "offset", "weights",
"fixed", "random", "model"), evaluate = TRUE, quote = NA)
x |
a numeric or logical vector, factor, numeric matrix,
|
center , scale |
either a logical value or a logical or numeric vector
of length equal to the number of columns of |
binary |
specifies how binary variables (logical or two-level factors)
are scaled. Default is to |
source |
a reference |
omit.cols |
column names or numeric indices of columns that should be left unaltered. |
prefix |
either a logical value specifying whether the names of transformed columns should be prefixed, or a two-element character vector giving the prefixes. The prefixes default to “z.” for scaled and “c.” for centred variables. |
append |
logical, if |
response |
logical, stating whether the response should be standardized. By default, only variables on the right-hand side of the formula are standardized. |
data |
an object coercible to |
newdata |
a |
... |
for the |
object |
a fitted model object or an expression being a |
which |
a character string naming arguments which should be modified.
This should be all arguments which are evaluated in the |
evaluate |
if |
quote |
if |
stdize
resembles scale
, but uses special rules
for factors, similarly to standardize
in package arm.
stdize
differs from standardize
in that it is used on
data rather than on the fitted model object. The scaled data should afterwards
be passed to the modelling function, instead of the original data.
Unlike standardize
, it applies special ‘binary’ scaling only to
two-level factor
s and logical variables, rather than to any variable with
two unique values.
Variables of only one unique value are unchanged.
By default, stdize
scales by dividing by standard deviation rather than twice
the SD as standardize
does. Scaling by SD is used
also on uncentred values, which is different from scale
where
root-mean-square is used.
If center
or scale
are logical scalars or vectors of length equal
to the number of columns of x
, the centring is done by subtracting the
mean (if center
corresponding to the column is TRUE
), and scaling
is done by dividing the (centred) value by standard deviation (if corresponding
scale
is TRUE
).
If center
or scale
are numeric vectors with length equal
to the number of columns of x
(or numeric scalars for vector methods),
then these are used instead. Any NA
s in the numeric vector result in no
centring or scaling on the corresponding column.
Note that scale = 0
is equivalent to no scaling (i.e. scale = 1
).
Binary variables, logical or factors with two levels, are converted to
numeric variables and transformed according to the argument binary
,
unless center
or scale
are explicitly given.
stdize
returns a vector or object of the same dimensions as x
,
where the values are centred and/or scaled. Transformation is carried out
column-wise in data.frame
s and matrices.
The returned value is compatible with that of scale
in that the
numeric centring and scalings used are stored in attributes
"scaled:center"
and "scaled:scale"
(these can be NA
if no
centring or scaling has been done).
stdizeFit
returns a modified, fitted model object that uses transformed
variables from newdata
, or, if evaluate
is FALSE
, an
unevaluated call where the variable names are replaced to point the transformed
variables.
Kamil Bartoń
Gelman, A. 2008 Scaling regression inputs by dividing by two standard deviations. Statistics in medicine 27, 2865–2873.
Compare with scale
and standardize
or
rescale
(the latter two in package arm).
For typical standardizing, model coefficients transformation may be
easier, see std.coef
.
apply
and sweep
for arbitrary transformations of
columns in a data.frame
.
# compare "stdize" and "scale"
nmat <- matrix(runif(15, 0, 10), ncol = 3)
stdize(nmat)
scale(nmat)
rootmeansq <- function(v) {
v <- v[!is.na(v)]
sqrt(sum(v^2) / max(1, length(v) - 1L))
}
scale(nmat, center = FALSE)
stdize(nmat, center = FALSE, scale = rootmeansq)
if(require(lme4)) {
# define scale function as twice the SD to reproduce "arm::standardize"
twosd <- function(v) 2 * sd(v, na.rm = TRUE)
# standardize data (scaled variables are prefixed with "z.")
z.CO2 <- stdize(uptake ~ conc + Plant, data = CO2, omit = "Plant", scale = twosd)
summary(z.CO2)
fmz <- stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant)), newdata = z.CO2)
# produces:
# lmer(uptake ~ z.conc + I(z.conc^2) + (1 | Plant), data = z.CO2)
## standardize using scale and center from "z.CO2", keeping the original data:
z.CO2a <- stdize(CO2, source = z.CO2, append = TRUE)
# Here, the "subset" expression uses untransformed variable, so we modify only
# "formula" argument, keeping "subset" as-is. For that reason we needed the
# untransformed variables in "newdata".
stdizeFit(lmer(uptake ~ conc + I(conc^2) + (1 | Plant),
subset = conc > 100,
), newdata = z.CO2a, which = "formula", evaluate = FALSE)
# create new data as a sequence along "conc"
newdata <- data.frame(conc = seq(min(CO2$conc), max(CO2$conc), length = 10))
# scale new data using scale and center of the original scaled data:
z.newdata <- stdize(newdata, source = z.CO2)
# plot predictions against "conc" on real scale:
plot(newdata$conc, predict(fmz, z.newdata, re.form = NA))
# compare with "arm::standardize"
## Not run:
library(arm)
fms <- standardize(lmer(uptake ~ conc + I(conc^2) + (1 | Plant), data = CO2))
plot(newdata$conc, predict(fms, z.newdata, re.form = NA))
## End(Not run)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.