This vignette describes a new feature to BGGM (2.0.0
) that allows for
computing network predictability for binary and ordinal data. Currently
the available option is Bayesian $R^2$ [@gelman_r2_2019].
# need the developmental version if (!requireNamespace("remotes")) { install.packages("remotes") } # install from github remotes::install_github("donaldRwilliams/BGGM") library(BGGM)
The first example looks at Binary data, consisting of 1190 observations and 6 variables. The data are called women_math
and the variable descriptions are provided in BGGM.
The model is estimated with
# binary data Y <- women_math # fit model fit <- estimate(Y, type = "binary")
and then predictability is computed
r2 <- predictability(fit) # print r2 #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Metric: Bayes R2 #> Type: binary #> --- #> Estimates: #> #> Node Post.mean Post.sd Cred.lb Cred.ub #> 1 0.016 0.012 0.002 0.046 #> 2 0.103 0.023 0.064 0.150 #> 3 0.155 0.030 0.092 0.210 #> 4 0.160 0.021 0.118 0.201 #> 5 0.162 0.022 0.118 0.202 #> 6 0.157 0.028 0.097 0.208 #> ---
There are then two options for plotting. The first is with error bars, denoting the credible interval (i.e., cred
),
plot(r2, type = "error_bar", size = 4, cred = 0.90)
and the second is with a ridgeline plot
plot(r2, type = "ridgeline", cred = 0.50)
In the following, the ptsd
data is used (5-level Likert). The variable descriptions are provided in BGGM. This is based on the polychoric partial correlations, with $R^2$ computed from the corresponding correlations (due to the correspondence between the correlation matrix and multiple regression).
Y <- ptsd fit <- estimate(Y + 1, type = "ordinal")
The only change is switching type from "binary
to ordinal
. One important
point is the + 1
. This is required because for the ordinal approach the first
category must be 1 (in ptsd
the first category is coded as 0).
r2 <- predictability(fit) # print r2 #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Metric: Bayes R2 #> Type: ordinal #> --- #> Estimates: #> #> Node Post.mean Post.sd Cred.lb Cred.ub #> 1 0.487 0.049 0.394 0.585 #> 2 0.497 0.047 0.412 0.592 #> 3 0.509 0.047 0.423 0.605 #> 4 0.524 0.049 0.441 0.633 #> 5 0.495 0.047 0.409 0.583 #> 6 0.297 0.043 0.217 0.379 #> 7 0.395 0.045 0.314 0.491 #> 8 0.250 0.042 0.173 0.336 #> 9 0.440 0.048 0.358 0.545 #> 10 0.417 0.044 0.337 0.508 #> 11 0.549 0.048 0.463 0.648 #> 12 0.508 0.048 0.423 0.607 #> 13 0.504 0.047 0.421 0.600 #> 14 0.485 0.043 0.411 0.568 #> 15 0.442 0.045 0.355 0.528 #> 16 0.332 0.039 0.257 0.414 #> 17 0.331 0.045 0.259 0.436 #> 18 0.423 0.044 0.345 0.510 #> 19 0.438 0.044 0.354 0.525 #> 20 0.362 0.043 0.285 0.454 #> ---
Here is the error_bar
plot.
plot(r2)
Note that the plot object is a ggplot
which allows for further customization (e.g,. adding the variable names, a title, etc.).
It is quite common to compute predictability assuming that the data are Gaussian. In the context of Bayesian GGMs, this was introduced in [@Williams2019]. This can also be implemented in BGGM.
# fit model fit <- estimate(Y) # predictability r2 <- predictability(fit)
type
is missing which indicates that continuous
is the default.
$R^2$ for binary and ordinal data is computed for the underlying latent variables. This is also the case
when type = "mixed
(a semi-parametric copula). In future releases, there will be support for predicting
the variables on the observed scale.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.