# Introduction

This vignette describes how to control for variables. This is a new feature to BGGM (version `2.0.0`).

# Example 1: Multivariate Regression

When controlling for variables, a multivariate regression is fitted in BGGM. In fact, a GGM can be understood as a multivariate regression with intercepts only models for the predictors.

BGGM does not use the typical approach for multivariate regression in `R`. This avoids having to write out each outcome variable, of which there are typically many in a GGM. In BGGM, it is assumed that the data matrix includes only the variables to be included in the GGM and the control variables.

### Correct

Suppose that we want to control for education level, with five variables included in the GGM.

```# data
Y <- bfi[,c(1:5, 27)]

#>       A1 A2 A3 A4 A5 education
#> 61617  2  4  3  4  4        NA
#> 61618  2  4  5  2  5        NA
#> 61620  5  4  5  4  4        NA
#> 61621  4  4  6  5  5        NA
#> 61622  2  3  3  4  5        NA
#> 61623  6  6  5  6  5         3
```

Notice that `Y` includes only the five variables and `education`.

## Fit Model

This model can then be fitted with

```fit <- explore(Y, formula = ~ as.factor(education))
```

To show this is indeed a multivariate regression, here are the summarized regression coefficients for the first outcome.

```summ_coef <- regression_summary(fit)

# outcome one
summ_coef\$reg_summary[]

#>                       Post.mean Post.sd Cred.lb Cred.ub
#> (Intercept)               0.256   0.095   0.072   0.442
#> as.factor(education)2     0.073   0.128  -0.177   0.323
#> as.factor(education)3    -0.202   0.104  -0.405  -0.001
#> as.factor(education)4    -0.462   0.119  -0.691  -0.233
#> as.factor(education)5    -0.578   0.117  -0.815  -0.346
```

And here are the coefficients from `lm` (a univariate regression for `A1`)

```round(
cbind(
# summary: coef and se
summary( lm(scale(A1, scale = F) ~ as.factor(education), data = Y))\$coefficients[,1:2],
# confidence interval
confint( lm(scale(A1, scale = F) ~ as.factor(education), data = Y))
), 3)

#>                       Estimate Std. Error  2.5 % 97.5 %
#> (Intercept)              0.256      0.093  0.073  0.438
#> as.factor(education)2    0.072      0.125 -0.172  0.316
#> as.factor(education)3   -0.203      0.101 -0.401 -0.004
#> as.factor(education)4   -0.461      0.116 -0.690 -0.233
#> as.factor(education)5   -0.578      0.115 -0.804 -0.351
```

The estimate are very (very) similar.

## Summary

Note that all the other functions work just the same. For example, the relations controlling for education are summarized with

```summary(fit)

#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: continuous
#> Analytic: FALSE
#> Formula: ~ as.factor(education)
#> Posterior Samples: 5000
#> Observations (n):
#> Nodes (p): 5
#> Relations: 10
#> ---
#> Call:
#> estimate(Y = Y, formula = ~as.factor(education))
#> ---
#> Estimates:
#>  Relation Post.mean Post.sd Cred.lb Cred.ub
#>    A1--A2    -0.239   0.020  -0.278  -0.200
#>    A1--A3    -0.109   0.020  -0.150  -0.070
#>    A2--A3     0.276   0.019   0.239   0.312
#>    A1--A4    -0.013   0.021  -0.055   0.026
#>    A2--A4     0.156   0.020   0.117   0.196
#>    A3--A4     0.173   0.020   0.134   0.214
#>    A1--A5    -0.010   0.020  -0.050   0.029
#>    A2--A5     0.150   0.020   0.111   0.189
#>    A3--A5     0.358   0.018   0.322   0.392
#>    A4--A5     0.121   0.020   0.082   0.159
#> ---
```

### Incorrect

Now if we wanted to control for education, but also had gender in `Y`, this would be incorrect

```Y <- bfi[,c(1:5, 26:27)]

#>       A1 A2 A3 A4 A5 gender education
#> 61617  2  4  3  4  4      1        NA
#> 61618  2  4  5  2  5      2        NA
#> 61620  5  4  5  4  4      2        NA
#> 61621  4  4  6  5  5      2        NA
#> 61622  2  3  3  4  5      1        NA
#> 61623  6  6  5  6  5      2         3
```

In this case, with `estimate(Y, formula = as.factor(education))`, the GGM would also include `gender` (six variables instead of the desired 5). This is because all variables not included in `formula` are included in the GGM. This was adopted in BGGM to save the user from having to write out each outcome.

This differs from `lm`, where each outcome needs to be written out, for example `cbind(A1, A2, A3, A4, A4) ~ as.factor(education)`. This is quite cumbersome for a model that includes many nodes.

# Example 2: Multivariate Probit

The above data is ordinal. In this case, it is possible to fit a multivariate probit model. This is also the approach for binary data in BGGM. This is implemented with

```fit <- estimate(Y, formula = ~ as.factor(education),
type = "ordinal", iter = 1000)
```

Note that the multivariate probit models can also be summarized with `regression_summary`.

# Example 3: Gaussian Copula Graphical Model

This final example fits a Gaussian copula graphical model that can be used for mixed data. In this case, `formula` is not used and instead all of the variables are included in the GGM.

## Fit Model

This model is estimated with

```# data
Y <- na.omit(bfi[,c(1:5, 27)])

# fit type = "mixed"
fit <- estimate(Y, type = "mixed", iter = 1000)

# summary
summary(fit)

#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Type: mixed
#> Analytic: FALSE
#> Formula:
#> Posterior Samples: 1000
#> Observations (n):
#> Nodes (p): 6
#> Relations: 15
#> ---
#> Call:
#> estimate(Y = Y, type = "mixed", iter = 1000)
#> ---
#> Estimates:
#>       Relation Post.mean Post.sd Cred.lb Cred.ub
#>         A1--A2    -0.217   0.048  -0.294  -0.114
#>         A1--A3    -0.063   0.027  -0.113  -0.011
#>         A2--A3     0.364   0.023   0.317   0.410
#>         A1--A4     0.116   0.038   0.048   0.192
#>         A2--A4     0.241   0.031   0.182   0.303
#>         A3--A4     0.228   0.026   0.174   0.275
#>         A1--A5     0.057   0.031   0.003   0.120
#>         A2--A5     0.186   0.027   0.135   0.241
#>         A3--A5     0.438   0.019   0.399   0.474
#>         A4--A5     0.151   0.025   0.103   0.199
#>  A1--education    -0.016   0.069  -0.125   0.119
#>  A2--education     0.063   0.049  -0.016   0.162
#>  A3--education     0.049   0.025   0.002   0.099
#>  A4--education     0.053   0.026   0.005   0.105
#>  A5--education     0.072   0.024   0.024   0.120
#> ---
```

Here it is clear that education is included in the model, as the relations with the other nodes are included in the output.

## Select Graph

The graph is selected with

```select(fit)
```

# Note

It is possible to control for variable with all methods in BGGM, including when comparing groups, Bayesian hypothesis testing, etc.

## Try the BGGM package in your browser

Any scripts or data that you put into this service are public.

BGGM documentation built on Aug. 20, 2021, 5:08 p.m.