ggm_compare_confirm: GGM Compare: Confirmatory Hypothesis Testing

Description Usage Arguments Details Value Note References Examples

View source: R/ggm_compare_confirm.R

Description

Confirmatory hypothesis testing for comparing GGMs. Hypotheses are expressed as equality and/or ineqaulity contraints on the partial correlations of interest. Here the focus is not on determining the graph (see explore) but testing specific hypotheses related to the conditional (in)dependence structure. These methods were introduced in \insertCiteWilliams2019_bf;textualBGGM and in \insertCitewilliams2020comparing;textualBGGM

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
ggm_compare_confirm(
  ...,
  hypothesis,
  formula = NULL,
  type = "continuous",
  mixed_type = NULL,
  prior_sd = 0.25,
  iter = 25000,
  impute = TRUE,
  progress = TRUE,
  seed = 1
)

Arguments

...

At least two matrices (or data frame) of dimensions n (observations) by p (nodes).

hypothesis

Character string. The hypothesis (or hypotheses) to be tested. See notes for futher details.

formula

an object of class formula. This allows for including control variables in the model (i.e., ~ gender).

type

Character string. Which type of data for Y ? The options include continuous, binary, ordinal, or mixed. Note that mixed can be used for data with only ordinal variables. See the note for further details.

mixed_type

numeric vector. An indicator of length p for which varibles should be treated as ranks. (1 for rank and 0 to assume normality). The default is currently (dev version) to treat all integer variables as ranks when type = "mixed" and NULL otherwise. See note for further details.

prior_sd

Numeric. The scale of the prior distribution (centered at zero), in reference to a beta distribtuion (defaults to 0.25).

iter

Number of iterations (posterior samples; defaults to 25,000).

impute

Logicial. Should the missing values (NA) be imputed during model fitting (defaults to TRUE) ?

progress

Logical. Should a progress bar be included (defaults to TRUE) ?

seed

An integer for the random seed.

Details

The hypotheses can be written either with the respective column names or numbers. For example, g1_1--2 denotes the relation between the variables in column 1 and 2 for group 1. The g1_ is required and the only difference from confirm (one group). Note that these must correspond to the upper triangular elements of the correlation matrix. This is accomplished by ensuring that the first number is smaller than the second number. This also applies when using column names (i.e,, in reference to the column number).

One Hypothesis:

To test whether a relation in larger in one group, while both are expected to be positive, this can be written as

This is then compared to the complement.

More Than One Hypothesis:

The above hypothesis can also be compared to, say, a null model by using ";" to seperate the hypotheses, for example,

Any number of hypotheses can be compared this way.

Using "&"

It is also possible to include &. This allows for testing one constraint and another contraint as one hypothesis.

Of course, it is then possible to include additional hypotheses by separating them with ";".

Testing Sums

It might also be interesting to test the sum of partial correlations. For example, that the sum of specific relations in one group is larger than the sum in another group.

Potential Delays:

There is a chance for a potentially long delay from the time the progress bar finishes to when the function is done running. This occurs when the hypotheses require further sampling to be tested, for example, when grouping relations c("(g1_A1--A2, g2_A2--A3) > (g2_A1--A2, g2_A2--A3)". This is not an error.

Controlling for Variables:

When controlling for variables, it is assumed that Y includes only the nodes in the GGM and the control variables. Internally, only the predictors that are included in formula are removed from Y. This is not behavior of, say, lm, but was adopted to ensure users do not have to write out each variable that should be included in the GGM. An example is provided below.

Mixed Type:

The term "mixed" is somewhat of a misnomer, because the method can be used for data including only continuous or only discrete variables \insertCitehoff2007extendingBGGM. This is based on the ranked likelihood which requires sampling the ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationally expensive when there are many levels. For example, with continuous data, there are as many ranks as data points!

The option mixed_type allows the user to determine which variable should be treated as ranks and the "emprical" distribution is used otherwise. This is accomplished by specifying an indicator vector of length p. A one indicates to use the ranks, whereas a zero indicates to "ignore" that variable. By default all integer variables are handled as ranks.

Dealing with Errors:

An error is most likely to arise when type = "ordinal". The are two common errors (although still rare):

Imputing Missing Values:

Missing values are imputed with the approach described in \insertCitehoff2009first;textualBGGM. The basic idea is to impute the missing values with the respective posterior pedictive distribution, given the observed data, as the model is being estimated. Note that the default is TRUE, but this ignored when there are no missing values. If set to FALSE, and there are missing values, list-wise deletion is performed with na.omit.

Value

The returned object of class confirm contains a lot of information that is used for printing and plotting the results. For users of BGGM, the following are the useful objects:

Note

"Default" Prior:

In Bayesian statistics, a default Bayes factor needs to have several properties. I refer interested users to \insertCite@section 2.2 in @dablander2020default;textualBGGM. In \insertCiteWilliams2019_bf;textualBGGM, some of these propteries were investigated (e.g., model selection consistency). That said, we would not consider this a "default" or "automatic" Bayes factor and thus we encourage users to perform sensitivity analyses by varying the scale of the prior distribution (prior_sd).

Furthermore, it is important to note there is no "correct" prior and, also, there is no need to entertain the possibility of a "true" model. Rather, the Bayes factor can be interpreted as which hypothesis best (relative to each other) predicts the observed data \insertCite@Section 3.2 in @Kass1995BGGM.

Interpretation of Conditional (In)dependence Models for Latent Data:

See BGGM-package for details about interpreting GGMs based on latent data (i.e, all data types besides "continuous")

References

\insertAllCited

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# note: iter = 250 for demonstrative purposes

# data
Y <- bfi

###############################
#### example 1: continuous ####
###############################

# males
Ymale   <- subset(Y, gender == 1,
                  select = -c(education,
                              gender))[,1:5]


# females
Yfemale <- subset(Y, gender == 2,
                     select = -c(education,
                                 gender))[,1:5]

 # exhaustive
 hypothesis <- c("g1_A1--A2 >  g2_A1--A2;
                  g1_A1--A2 <  g2_A1--A2;
                  g1_A1--A2 =  g2_A1--A2")

# test hyp
test <- ggm_compare_confirm(Ymale,  Yfemale,
                            hypothesis = hypothesis,
                            iter = 250,
                            progress = FALSE)

# print (evidence not strong)
test

#########################################
#### example 2: sensitivity to prior ####
#########################################
# continued from example 1

# decrease prior SD
test <- ggm_compare_confirm(Ymale,
                            Yfemale,
                            prior_sd = 0.1,
                            hypothesis = hypothesis,
                            iter = 250,
                            progress = FALSE)

# print
test

# indecrease prior SD
test <- ggm_compare_confirm(Ymale,
                            Yfemale,
                            prior_sd = 0.5,
                            hypothesis = hypothesis,
                            iter = 250,
                            progress = FALSE)

# print
test

################################
#### example 3: mixed data #####
################################

hypothesis <- c("g1_A1--A2 >  g2_A1--A2;
                 g1_A1--A2 <  g2_A1--A2;
                 g1_A1--A2 =  g2_A1--A2")

# test (1000 for example)
test <- ggm_compare_confirm(Ymale,
                            Yfemale,
                            type = "mixed",
                            hypothesis = hypothesis,
                            iter = 250,
                            progress = FALSE)

# print
test

##############################
##### example 4: control #####
##############################
# control for education

# data
Y <- bfi

# males
Ymale   <- subset(Y, gender == 1,
                  select = -c(gender))[,c(1:5, 26)]

# females
Yfemale <- subset(Y, gender == 2,
                  select = -c(gender))[,c(1:5, 26)]

# test
test <- ggm_compare_confirm(Ymale,
                             Yfemale,
                             formula = ~ education,
                             hypothesis = hypothesis,
                             iter = 250,
                             progress = FALSE)
# print
test


#####################################
##### example 5: many relations #####
#####################################

# data
Y <- bfi

hypothesis <- c("g1_A1--A2 > g2_A1--A2 & g1_A1--A3 = g2_A1--A3;
                 g1_A1--A2 = g2_A1--A2 & g1_A1--A3 = g2_A1--A3;
                 g1_A1--A2 = g2_A1--A2 = g1_A1--A3 = g2_A1--A3")

Ymale   <- subset(Y, gender == 1,
                  select = -c(education,
                              gender))[,1:5]


# females
Yfemale <- subset(Y, gender == 2,
                     select = -c(education,
                                 gender))[,1:5]

test <- ggm_compare_confirm(Ymale,
                            Yfemale,
                             hypothesis = hypothesis,
                             iter = 250,
                             progress = FALSE)

# print
test

BGGM documentation built on Aug. 20, 2021, 5:08 p.m.