formulize: Generate formulas for single- and multilevel linear modeling

View source: R/formulize.R

formulizeR Documentation

Generate formulas for single- and multilevel linear modeling

Description

This function generates formula expressions from vectors of outcome and predictor names for single- and multilevel linear models to be fitted with stats::lm() and lme4::lmer(), respectively. This function is especially helpful for generating a large number of formulas with varying outcomes and predictors at once.

Usage

formulize(.outcome, ..., .clustering, .random_slopes = NULL, .ranef_cor = TRUE)

Arguments

.outcome

A character vector of outcome names.

...

<dynamic-dots> Optional. Character vectors of predictor names.

.clustering

Optional. The cluster structure to specify random intercepts for multilevel formulas.

.random_slopes

Optional. A list of character vectors of predictor names for which random slopes should be included in the formula. Only relevant if .clustering is specified.

.ranef_cor

A logical indicating whether random intercepts and slopes should be correlated. Only relevant if .random_slopes is specified. The default is TRUE.

Details

  • Generating a single-level formula for models to be fitted with stats::lm() simply requires not specifying .clustering.

  • Generating a multilevel formula for models to be fitted with lme4::lmer() requires the specification of one or multiple nested cluster identifiers in .clustering. So far, only one single random effects term of the form "⁠(random expression | .clustering)⁠" can be expressed.

    • To express a formula for a simple two-level model, specify a single cluster identifier (e.g., ".clustering = id_schools"). To express a formula for a multilevel model with more than two hierarchical (i.e., nested) levels, specify multiple cluster identifiers separated by a slash, beginning with the highest level (e.g., ".clustering = id_schools / id_classrooms").

    • .random_slopes is only evaluated if .clustering is specified. The input has to be a list as created with list() (e.g., for including a random slope term for vector "x1", specify ".random_slopes = list(x1)"). If no slopes are specified, a formula for a random intercept model is generated (i.e., "(1 | .clustering)").

    • If applicable, (i.e., if .clustering and .random_slopes are specified), .ranef_cor, by default, includes a term for correlated random effects in the formula, which corresponds to one vertical bar operator as used in lme4 (i.e., "|"). Setting .ranef_cor to FALSE corresponds to two vertical bar operators as used in lme4 (i.e., "||") and leads to including a term for uncorrelated random effects.

  • If an element in .outcome is NA, the output for this element will also be NA.

  • ... also takes a list of character vectors of predictor names that can be spliced with ⁠!!!⁠.

  • It is possible to include a varying number of predictors for each outcome by setting the corresponding entries in the character vectors specified in ... and/or the list of character vectors specified in .random_slopes to NA.

Value

A character vector of formula expressions.

See Also

See the stats and lme4 packages.

Examples

# Formulize single-level models with one or two predictors
formulize(.outcome = c("y1", "y2"), c("x1", "x2"), c(NA, "z2"))
# [1] "y1 ~ 1 + x1"
# [2] "y2 ~ 1 + x2 + z2"

# Formulize a three-level model with random intercepts
formulize(.outcome = c("y1", "y2"), c("x1", "x2"), c(NA, "z2"),
.clustering = id_schools/id_classrooms)
# [1] "y1 ~ 1 + x1 + (1 | id_schools/id_classrooms)"
# [2] "y2 ~ 1 + x2 + z2 + (1 | id_schools/id_classrooms)"

# Formulize a two-level model with random intercept and slopes
formulize(.outcome = c("y1", "y2"), c("x1", "x2"), c(NA, "z2"),
.clustering = id_schools,
.random_slopes = list(c("x1", "x2"), c(NA, "z2")))
# [1] "y1 ~ 1 + x1 + (1 + x1 | id_schools)"
# [2] "y2 ~ 1 + x2 + z2 + (1 + x2 + z2 | id_schools)"

sophiestallasch/multides documentation built on Oct. 20, 2024, 5:14 a.m.