This package simulates data using the bipartite stochastic block model (BiSbm).
You can install the unrelased version of bisbmsim from github with:
devtools::install_github('tbilab/bisbmsim')
If you want to install a non-changing version of this package, pin to the latest release.
devtools::install_github('tbilab/BiSBMSim', ref = 'v0.10')
library(bisbmsim)
library(tidyverse)
library(magrittr)
library(purrr)
Super simple simulation.
First we need to setup basic parameters corresponding to model
N_a <- 125 # Number of nodes of the a type
N_b <- 105 # Number of nodes of the b type
K_a <- 4 # How many blocks of a type nodes are there
K_b <- 3 # How many blocks of the b type nodes are there
The Stochastic Block Model is defined by two main inputs.
The group membership vectors are vectors of length N_a
for a-nodes and
N_b
for b-nodes. Each element in these vectors is the integer ID of
the group that node belongs to.
assign_group_membership
is a function that will build these membership
vectors for you when supplied with size and group numbers. By default
they evenly space the groups across the nodes but by using the option
place_randomly = TRUE
you can also stochastically assign group
membership.
b_a <- assign_group_membership(N = N_a, K = K_a) # Block membership for each a node
b_b <- assign_group_membership(N = N_b, K = K_b) # Block membership for each b node
The Lambda matrix (or tidy dataframe in this case) tells the model the
average number of connections between each pair of a and b groups. In
the case of the standard Poisson SBM this means that if the model is
determining the number of edges between a node in group i
and a node
in group j
, it does so by drawing from a Poisson distribution with the
mean (or lambda) value of Lambda(i,j)
.
Lambda <- generate_random_lambda(K_a = K_a, K_b = K_b)
Here’s what this Lambda looks like.
ggplot(Lambda, aes(x = a, y = b)) +
geom_tile(
aes(fill = avg_num_cons),
color = 'white'
) +
scale_fill_gradient(low = "white", high = "#56B1F7")
The generate_random_lambda()
function by default builds a lambda
between K_a
a-type groups and K_b
b-type groups by drawing from a
random uniform distribution. You can change the model that generates the
values by passing the random_generator
option a function that takes an
input n
and returns n
values.
For instance, here’s a Lambda where the individual values are drawn from a beta distribution.
drawn_n_betas <- function(n) {
rbeta(n, shape1 = 2, shape2 = 2)
}
generate_random_lambda(K_a = K_a, K_b = K_b, random_generator = drawn_n_betas)
#> a b avg_num_cons
#> 1 1 1 0.74878219
#> 2 2 1 0.03877181
#> 3 3 1 0.69267063
#> 4 4 1 0.60364815
#> 5 1 2 0.31644165
#> 6 2 2 0.55183415
#> 7 3 2 0.61135893
#> 8 4 2 0.16986558
#> 9 1 3 0.23948770
#> 10 2 3 0.46559814
#> 11 3 3 0.90508306
#> 12 4 3 0.58878403
Now that we have defined the two main inputs needed for determining an SBM, the group membership vectors and the Lambda matrix, we can proceed to actually drawing from the SBM model.
Note that we have provided names for the different node types with the
a_name
and b_name
arguments.
all_node_pairs <- draw_from_model(b_a, b_b, Lambda, a_name = "Subjects", b_name = "Phecodes")
all_node_pairs %>% head()
#> # A tibble: 6 x 6
#> Subjects Phecodes a_group b_group avg_num_cons num_edges
#> <int> <int> <int> <int> <dbl> <int>
#> 1 1 1 1 1 0.393 0
#> 2 2 1 2 1 0.593 1
#> 3 3 1 3 1 0.877 1
#> 4 4 1 4 1 0.627 0
#> 5 5 1 1 1 0.393 1
#> 6 6 1 2 1 0.593 1
We can compare this now with the true underlying generation parameters….
plot_sim_results <- function(drawn_node_pairs){
drawn_node_pairs %>%
gather(key = 'type', value = 'connections', num_edges, avg_num_cons) %>%
mutate(type = ifelse(type == 'num_edges', "Drawn Values", "Lambda")) %>%
arrange(a_group, b_group) %>%
ggplot(aes(y = reorder(Subjects, a_group), x = reorder(Phecodes, b_group))) +
geom_tile(aes(fill = connections), color = "white") +
facet_wrap(~type) +
scale_fill_gradient(low = "white", high = "#56B1F7") +
theme_minimal() +
theme(axis.text = element_blank()) +
labs(y = 'Subjects', x = 'Phecodes')
}
plot_sim_results(all_node_pairs)
If only binary connected, not-connected results are desired, there is an
option binary_connections
in draw_from_model()
that uses the
bernouli distribution to decide if each edge exists or not, rather than
the standard Poisson.
draw_from_model(
b_a = b_a,
b_b = b_b,
Lambda = Lambda,
binary_connections = TRUE,
a_name = "Subjects",
b_name = "Phecodes"
) %>%
plot_sim_results()
If more control over the model setup is desired, the function
setup_planted_pattern_model
allows specification of a full model
allowing you to specify patterns of b-nodes that characterize groups of
a-nodes.
my_patterns <- tribble(
~b1, ~b2, ~b3, ~b4, ~size,
1, 1, 0, 0, 10,
1, 1, 1, 0, 15,
0, 0, 0, 1, 13,
0, 0, 1, 1, 24
)
planted_model_params <- setup_planted_pattern_model(
my_patterns,
num_noise_nodes = 10,
noise_p = 0.03,
planted_p_on = 0.95,
planted_p_off = 0.01
)
planted_model_params
#> $b_a
#> [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
#> [36] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>
#> $b_b
#> [1] 1 2 3 4 5 5 5 5 5 5 5 5 5 5
#>
#> $Lambda
#> # A tibble: 20 x 3
#> a b avg_num_cons
#> <int> <int> <dbl>
#> 1 1 1 0.95
#> 2 2 1 0.95
#> 3 3 1 0.01
#> 4 4 1 0.01
#> 5 1 2 0.95
#> 6 2 2 0.95
#> 7 3 2 0.01
#> 8 4 2 0.01
#> 9 1 3 0.01
#> 10 2 3 0.95
#> 11 3 3 0.01
#> 12 4 3 0.95
#> 13 1 4 0.01
#> 14 2 4 0.01
#> 15 3 4 0.95
#> 16 4 4 0.95
#> 17 1 5 0.03
#> 18 2 5 0.03
#> 19 3 5 0.03
#> 20 4 5 0.03
The function returns a list with the three needed inputs for
draw_from_model()
draw_from_planted <- planted_model_params %$%
draw_from_model(
b_a, b_b, Lambda,
binary_connections = TRUE,
a_name = "Subjects",
b_name = "Phecodes" )
plot_sim_results(draw_from_planted)
Once you have your SBM inputs setup all you need to do is loop for as
many simulations as you want and call draw_from_model()
each time to
get new draws.
num_draws <- 36
all_draws <- 1:num_draws %>%
purrr::map_dfr(function(draw_num){
planted_model_params %$%
draw_from_model(b_a, b_b, Lambda, binary_connections = TRUE, a_name = "Subjects", b_name = "Phecodes") %>%
mutate(draw = draw_num)
})
all_draws %>%
arrange(a_group, b_group) %>%
ggplot(aes(y = reorder(Subjects, a_group), x = reorder(Phecodes, b_group))) +
geom_tile(aes(fill = num_edges), color = "white") +
scale_fill_gradient(low = "white", high = "#56B1F7") +
labs(y = 'Subjects', x = 'Phecodes') +
guides(fill = FALSE) +
facet_wrap(~draw) +
theme_minimal() +
theme(axis.text = element_blank())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.