On a Facebook methods group, there was a question about testing hypotheses in networks. In
the comments, it was suggested that **BGGM** could be used to test the hypothesis. And it turns
out that **BGGM** really shines for testing expectations [see for example @rodriguez2020formalizing].

In this vignette, I demonstrate three ways to go about testing the same hypothesis, which is
essentially testing for a difference in the **sum** of partial correlations between groups.

# need the developmental version if (!requireNamespace("remotes")) { install.packages("remotes") } # install from github remotes::install_github("donaldRwilliams/BGGM") library(BGGM)

For demonstrative purposes, I use the `bfi`

data and test the hypotheses in males and females

# data Y <- bfi # males Y_males <- subset(Y, gender == 1, select = -c(education, gender))[,1:5] # females Y_females <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]

The first approach is rather straightforward, with the caveat that the method needs to be
implemented by the user. Note that I could certainly implement this in **BGGM**, assuming there
is enough interest. Please make a feature request here.

The hypothesis was that a sum of relations was larger in one group, for example,

$$
\begin{align}
\mathcal{H}*0: (\rho^{male}*{A1--A2}\; + \; \rho^{male}*{A1--A3}) = (\rho^{female}*{A1--A2}\; + \; \rho^{female}*{A1--A3}) \
\mathcal{H}_1: (\rho^{male}*{A1--A2}\; + \; \rho^{male}*{A1--A3}) > (\rho^{female}*{A1--A2}\; + \; \rho^{female}_{A1--A3})
\end{align}
$$
Note that the hypothesis is related to the sum of relations, which is readily tested in **BGGM**.

The first step is to estimate the model for each group

# fit female fit_female <- estimate(Y_females, seed = 2) # fit males fit_male <- estimate(Y_males, seed = 1)

For an example, I used the default which is to assume the data is Gaussian. This can be changed with `type =`

either `binary`

, `ordinal`

, or `mixed`

.

The next step is to extract the posterior samples for each relation

post_male <- posterior_samples(fit_male)[,c("A1--A2", "A1--A3")] post_female <- posterior_samples(fit_female)[,c("A1--A2", "A1--A3")]

Note that the column names reflect the upper-triangular elements of the
partial correlation matrix. Hence, the first name (e.g.,`A1`

) must be located before
the second name (e.g., `A2`

) in the data matrix. This can be understood in reference
to the column numbers: `1--2`

is correct whereas `2--1`

will result in an error.

The next step is to sum the relations and compute the difference

# sum males sum_male <- rowSums(post_male) # sum females sum_female <- rowSums(post_female) # difference diff <- sum_male - sum_female

which can then be plotted

# three column par(mfrow=c(1,3)) # male sum hist(sum_male) # female sum hist(sum_female) # difference hist(diff)

Next compute the posterior probability the sum is larger in males than females

# posterior prob mean(sum_male > sum_female) #> 0.737

and then the credible interval for the difference

quantile(diff, probs = c(0.025, 0.975)) #> 2.5% 97.5% #> -0.06498586 0.12481253

The next approach is based on a posterior predictive check. The hypothesis is essentially the same as above, but for the predictive distribution, that is,

$$
\begin{align}
\mathcal{H}*0: (\rho^{male^{yrep}}*{A1--A2}\; + \; \rho^{male^{yrep}}*{A1--A3}) = (\rho^{female^{yrep}}*{A1--A2}\; + \; \rho^{female^{yrep}}*{A1--A3}) \
\mathcal{H}_1: (\rho^{male^{yrep}}*{A1--A2}\; + \; \rho^{male^{yrep}}*{A1--A3}) > (\rho^{female^{yrep}}*{A1--A2}\; + \; \rho^{female^{yrep}}_{A1--A3})
\end{align}
$$
where the only difference is $yrep$. See more details here.

The first step is to define a function to compute the difference in sums

# colnames cn <- colnames(Y_males) # function f <- function(Yg1, Yg2){ # data Yg1 <- na.omit(Yg1) Yg2 <- na.omit(Yg2) # estimate partials fit1 <- pcor_mat(estimate(Yg1, analytic = TRUE)) fit2 <- pcor_mat(estimate(Yg2, analytic = TRUE)) # names (not needed) colnames(fit1) <- cn rownames(fit1) <- cn colnames(fit2) <- cn rownames(fit2) <- cn # take sum sum1 <- fit1["A1", "A2"] + fit1["A1", "A3"] sum2 <- fit2["A1", "A2"] + fit2["A1", "A3"] # difference sum1 - sum2 }

Note that the function takes two data matrices and then returns a single value.
Also, the default in **BGGM** does not require a custom function
(only needs the data from each group).

The next step is to compute the observed difference and then perform the check.

# observed obs <- f(Y_males, Y_females) # check ppc <- ggm_compare_ppc(Y_males, Y_females, iter = 250, FUN = f, custom_obs = obs) # print ppc #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Test: Global Predictive Check #> Posterior Samples: 250 #> Group 1: 896 #> Group 2: 1813 #> Nodes: 5 #> Relations: 10 #> --- #> Call: #> ggm_compare_ppc(Y_males, Y_females, iter = 250, FUN = f, custom_obs = obs) #> --- #> Custom: #> #> contrast custom.obs p.value #> Yg1 vs Yg2 0.029 0.264 #> ---

Note this requires the user to determine $\alpha$.

The check can also be plotted

```
plot(ppc)
```

where the red is the critical region.

The above approaches cannot provide evidence that the sum is equal. In other words, just because there was not a difference, this does not provide evidence for equality. The Bayes factor methods allow for formally assessing the equality model, that is,

$$
\begin{align}
\mathcal{H}*1&: (\rho^{male}*{A1--A2}\; + \; \rho^{male}*{A1--A3}) > (\rho^{female}*{A1--A2}\; + \; \rho^{female}*{A1--A3}) \
\mathcal{H}_2&: (\rho^{male}*{A1--A2}\; + \; \rho^{male}*{A1--A3}) = (\rho^{female}*{A1--A2}\; + \; \rho^{female}_{A1--A3}) \
\mathcal{H}_3&: \text{not} \; \mathcal{H}_1 \; \text{or} \; \mathcal{H}_2
\end{align}
$$

where $\mathcal{H}_3$ is the complement and can be understood as neither the first or second hypothesis.

The hypothesis is easily translated to `R`

code

hyp <- c("g1_A1--A2 + g1_A1--A3 > g2_A1--A2 + g2_A1--A3; g1_A1--A2 + g1_A1--A3 = g2_A1--A2 + g2_A1--A3")

Note the `g1`

indicates the group and `;`

separates the hypotheses. I again assume the data is Gaussian
(although this can be changed to `type = "ordinal"`

or `type = "mixed"`

; see here)

test <- ggm_compare_confirm(Y_males, Y_females, hypothesis = hyp) # print test #> BGGM: Bayesian Gaussian Graphical Models #> Type: continuous #> --- #> Posterior Samples: 25000 #> Group 1: 896 #> Group 2: 1813 #> Variables (p): 5 #> Relations: 10 #> Delta: 15 #> --- #> Call: #> ggm_compare_confirm(Y_males, Y_females, hypothesis = hyp) #> --- #> Hypotheses: #> #> H1: g1_A1--A2+g1_A1--A3>g2_A1--A2+g2_A1--A3 #> H2: g1_A1--A2+g1_A1--A3=g2_A1--A2+g2_A1--A3 #> H3: complement #> --- #> Posterior prob: #> #> p(H1|data) = 0.13 #> p(H2|data) = 0.825 #> p(H3|data) = 0.046 #> --- #> Bayes factor matrix: #> H1 H2 H3 #> H1 1.000 0.158 2.853 #> H2 6.349 1.000 18.113 #> H3 0.351 0.055 1.000 #> --- #> note: equal hypothesis prior probabilities

Note the posterior hypothesis probability for the equality model is 0.825. The Bayes factor matrix then divides those values, for example, $BF_{21}$ indicates the data were about 6 times more likely under $\mathcal{H}_2$ than $\mathcal{H}_1$.

The hypothesis can be plotted

```
plot(test)
```

It is also important to check the robustness. Here the width of the prior distribution is decreased

test <- ggm_compare_confirm(Y_males, Y_females, hypothesis = hyp, prior_sd = 0.15) # print test$out_hyp_prob #> 0.18523406 0.74906147 0.06570447

which results in a probability of 0.75 for $\mathcal{H}*2$ ($BF*{21} = 4.04$).

Three approaches for testing the same hypothesis were demonstrated in this vignette. This highlights that any hypothesis can be tested in **BGGM** and in several ways.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.