adjust: Adjust data for the effect of other variable(s)

Description Usage Arguments Value Examples

View source: R/adjust.R

Description

This function can be used to adjust the data for the effect of other variables present in the dataset. It is based on an underlying fitting of regressions models, allowing for quite some flexibility, such as including factors as random effects in mixed models (multilevel partialization), continuous variables as smooth terms in general additive models (non-linear partialization) and/or fitting these models under a Bayesian framework. The values returned by this function are the residuals of the regression models. Note that a regular correlation between two "adjusted" variables is equivalent to the partial correlation between them.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
adjust(
  data,
  effect = NULL,
  select = NULL,
  exclude = NULL,
  multilevel = FALSE,
  additive = FALSE,
  bayesian = FALSE,
  keep_intercept = FALSE
)

data_adjust(
  data,
  effect = NULL,
  select = NULL,
  exclude = NULL,
  multilevel = FALSE,
  additive = FALSE,
  bayesian = FALSE,
  keep_intercept = FALSE
)

Arguments

data

A dataframe.

effect

Character vector of column names to be adjusted for (regressed out). If NULL (the default), all variables will be selected.

select

Character vector of column names. If NULL (the default), all variables will be selected.

exclude

Character vector of column names to be excluded from selection.

multilevel

If TRUE, the factors are included as random factors. Else, if FALSE (default), they are included as fixed effects in the simple regression model.

additive

If TRUE, continuous variables as included as smooth terms in additive models. The goal is to regress-out potential non-linear effects.

bayesian

If TRUE, the models are fitted under the Bayesian framework using rstanarm.

keep_intercept

If FALSE (default), the intercept of the model is re-added. This avoids the centering around 0 that happens by default when regressing out another variable (see the examples below for a visual representation of this).

Value

A data frame comparable to data, with adjusted variables.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
adjusted_all <- adjust(attitude)
head(adjusted_all)
adjusted_one <- adjust(attitude, effect = "complaints", select = "rating")
head(adjusted_one)

adjust(attitude, effect = "complaints", select = "rating", bayesian = TRUE)
adjust(attitude, effect = "complaints", select = "rating", additive = TRUE)
attitude$complaints_LMH <- cut(attitude$complaints, 3)
adjust(attitude, effect = "complaints_LMH", select = "rating", multilevel = TRUE)


if (require("MASS") && require("bayestestR")) {
  # Generate data
  data <- simulate_correlation(n = 100, r = 0.7)
  data$V2 <- (5 * data$V2) + 20 # Add intercept

  # Adjust
  adjusted <- adjust(data, effect = "V1", select = "V2")
  adjusted_icpt <- adjust(data, effect = "V1", select = "V2", keep_intercept = TRUE)

  # Visualize
  plot(data$V1, data$V2,
    pch = 19, col = "blue",
    ylim = c(min(adjusted$V2), max(data$V2)),
    main = "Original (blue), adjusted (green), and adjusted - intercept kept (red) data"
  )
  abline(lm(V2 ~ V1, data = data), col = "blue")
  points(adjusted$V1, adjusted$V2, pch = 19, col = "green")
  abline(lm(V2 ~ V1, data = adjusted), col = "green")
  points(adjusted_icpt$V1, adjusted_icpt$V2, pch = 19, col = "red")
  abline(lm(V2 ~ V1, data = adjusted_icpt), col = "red")
}

datawizard documentation built on Oct. 4, 2021, 9:07 a.m.