Description Usage Arguments Details Value Note Author(s) See Also Examples
Create a standardized
object which places
all variables in data
on the same scale based on formula
,
making regression output easier to interpret.
For mixed effects regressions, this also offers computational benefits, and
for Bayesian regressions, it also makes determining reasonable priors easier.
1 |
formula |
A regression |
data |
A data.frame containing the variables in |
family |
A regression |
scale |
The desired scale for the regression frame. Must be a single positive number. See 'Details'. |
offset |
An optional |
... |
Currently unused. If |
First model.frame
is called. Then,
if family = gaussian
, the response is checked to ensure that it is
numeric and has more than two unique values. If scale_by
is
used on the response in formula
, then the scale
argument to
scale_by
is ignored and forced to 1
. If scale_by
is not called, then scale
is used with default arguments.
The result is that gaussian responses are on unit scale (i.e. have mean
0
and standard deviation 1
), or, if scale_by
is
used on the left hand side of formula
, unit scale within each
level of the specified conditioning factor.
Offsets in gaussian models are divided by the standard deviation of the
the response prior to scaling (within-factor-level if scale_by
is used on the response). In this way, if the transformed offset is added
to the transformed response, and then placed back on the response's original
scale, the result would be the same as if the un-transformed offset had
been added to the un-transformed response.
For all other values for family
, the response and offsets are not checked.
If offsets are used within the formula
, then they will be in the
formula
and data
elements of the standardized
object. If the offset
argument to the standardize
function is
used, then the offset provided in the argument will be
in the offset
element of the standardized
object
(scaled if family = gaussian
).
For the other predictors in the formula, first any random effects grouping factors
in the formula are coerced to factor and unused levels are dropped. The
levels of the resulting factor are then recorded in the groups
element.
Then for the remaining predictors, regardless of their original
class, if they have only two unique non-NA
values, they are coerced
to unordered factors. Then, named_contr_sum
and
scaled_contr_poly
are called for unordered and ordered factors,
respectively, using the scale
argument provided in the call
to standardize
as the scale
argument to the contrast
functions. For numeric variables, if the variable contains a call to
scale_by
, then, regardless of whether the call to
scale_by
specifies scale
, the value of scale
in the call to standardize
is used. If the numeric variable
does not contain a call to scale_by
, then
scale
is called, ensuring that the result has
standard deviation scale
.
With the default value of scale = 1
, the result is a
standardized
object which contains a formula and data
frame (and offset vector if the offset
argument to the
standardize
function was used) which can be used to fit regressions
where the predictors are all on a similar scale. Its data frame
has numeric variables on unit scale, unordered factors with named sum
sum contrasts, and ordered factors with orthogonal polynomial contrasts
on unit scale. For gaussian regressions, the response is also placed on
unit scale. If scale = 0.5
(for example),
then gaussian responses would still
be placed on unit scale, but unordered factors' named sum contrasts would
take on values -0.5, 0, 0.5 rather than -1, 0, 1, the standard deviation
of each column in the contrast matrices for ordered factors would be
0.5
rather than 1
, and the standard deviation of numeric
variables would be 0.5
rather than 1
(within-factor-level
in the case of scale_by
calls).
A standardized
object. The
formula
, data
, and offset
elements of the object can
be used in calls to regression functions.
The scale_by
function is supported so long as it is not nested within other function
calls. The poly
function is supported so long as
it is either not nested within other function calls, or is nested as the
transformation of the numeric variable in a scale_by
call.
If poly
is used, then the lsmeans
function
will yield misleading results (as would normally be the case).
In previous versions of standardize
(v0.2.0 and earlier),
na.action
could be specified. Starting with v0.2.1, specifying
something other than na.pass
is ignored with a warning. Use of
na.omit
and na.exclude
should be done when calling regression
fitting functions using the elements returned in the
standardized
object.
Christopher D. Eager <eager.stats@gmail.com>
For scaling and contrasts, see scale
,
scale_by
, named_contr_sum
, and
scaled_contr_poly
. For putting new data into the same space
as the standardized data, see predict
.
For the elements in the returned object, see
standardized
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | dat <- expand.grid(ufac = letters[1:3], ofac = 1:3)
dat <- as.data.frame(lapply(dat, function(n) rep(n, 60)))
dat$ofac <- factor(dat$ofac, ordered = TRUE)
dat$x <- rpois(nrow(dat), 5)
dat$z <- rnorm(nrow(dat), rep(rnorm(30), each = 18), rep(runif(30), each = 18))
dat$subj <- rep(1:30, each = 18)
dat$y <- rnorm(nrow(dat), -2, 5)
sobj <- standardize(y ~ log(x + 1) + scale_by(z ~ subj) + ufac + ofac +
(1 | subj), dat)
sobj
sobj$formula
head(dat)
head(sobj$data)
sobj$contrasts
sobj$groups
mean(sobj$data$y)
sd(sobj$data$y)
mean(sobj$data$log_x.p.1)
sd(sobj$data$log_x.p.1)
with(sobj$data, tapply(z_scaled_by_subj, subj, mean))
with(sobj$data, tapply(z_scaled_by_subj, subj, sd))
sobj <- standardize(y ~ log(x + 1) + scale_by(z ~ subj) + ufac + ofac +
(1 | subj), dat, scale = 0.5)
sobj
sobj$formula
head(dat)
head(sobj$data)
sobj$contrasts
sobj$groups
mean(sobj$data$y)
sd(sobj$data$y)
mean(sobj$data$log_x.p.1)
sd(sobj$data$log_x.p.1)
with(sobj$data, tapply(z_scaled_by_subj, subj, mean))
with(sobj$data, tapply(z_scaled_by_subj, subj, sd))
## Not run:
mod <- lmer(sobj$formula, sobj$data)
# this next line causes warnings about contrasts being dropped, but
# these warnings can be ignored (i.e. the statement still evaluates to TRUE)
all.equal(predict(mod, newdata = predict(sobj, dat)), fitted(mod))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.