sim_bnb | R Documentation |
Simulate data from the bivariate negative binomial (BNB) distribution. The
BNB distribution is used to simulate count data where the event counts are
jointly dependent (correlated). For independent data, see
sim_nb()
.
sim_bnb(
n,
mean1,
mean2,
ratio,
dispersion,
nsims = 1L,
return_type = "list",
max_zeros = 0.99,
ncores = 1L
)
n |
(integer: |
mean1 |
(numeric: |
mean2 , ratio |
(numeric:
|
dispersion |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
max_zeros |
(Scalar numeric: |
ncores |
(Scalar integer: |
The negative binomial distribution may be defined using a gamma-Poisson
mixture distribution. In this case, the Poisson parameter \lambda
is a random variable with gamma distribution. Equivalence between different
parameterizations are demonstrated below:
# Define constants and their relationships n <- 10000 dispersion <- 8 mu <- 4 p <- dispersion / (dispersion + mu) q <- mu / (mu + dispersion) variance <- mu + (mu^2 / dispersion) rate <- p / (1 - p) scale <- (1 - p) / p # alternative formula for mu mu_alt <- (dispersion * (1 - p)) / p stopifnot(isTRUE(all.equal(mu, mu_alt))) set.seed(20240321) # Using built-in rnbinom with dispersion and mean w <- rnbinom(n = n, size = dispersion, mu = mu) # Using gamma-Poisson mixture with gamma rate parameter x <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, rate = rate) ) # Using gamma-Poisson mixture with gamma scale parameter y <- rpois( n = n, lambda = rgamma(n = n, shape = dispersion, scale = scale) ) # Using gamma-Poisson mixture with multiplicative mean and # gamma scale parameter z <- rpois( n = n, lambda = mu * rgamma(n = n, shape = dispersion, scale = 1/dispersion) ) # Compare CDFs par(mar=c(4,4,1,1)) plot( x = sort(w), y = (1:n)/n, xlim = range(c(w,x,y,z)), ylim = c(0,1), col = 'green', lwd = 4, type = 'l', main = 'CDF' ) lines(x = sort(x), y = (1:n)/n, col = 'red', lwd = 2) lines(x = sort(y), y = (1:n)/n, col = 'yellow', lwd = 1.5) lines(x = sort(z), y = (1:n)/n, col = 'black')
The BNB distribution is implemented by compounding two conditionally
independent Poisson random variables X_1 \mid G = g \sim \text{Poisson}(\mu g)
and X_2 \mid G = g \sim \text{Poisson}(r \mu g)
with a gamma random
variable G \sim \text{Gamma}(\theta, \theta^{-1})
. The probability mass
function for the joint distribution of X_1, X_2
is
P(X_1 = x_1, X_2 = x_2) = \frac{\Gamma(x_1 + x_2 + \theta)}{(\mu + r \mu + \theta)^{x_1 + x_2 + \theta}}
\frac{\mu^{x_1}}{x_1!} \frac{(r \mu)^{x_2}}{x_2!}
\frac{\theta^{\theta}}{\Gamma(\theta)}
where x_1,x_2 \in \mathbb{N}^{\geq 0}
are specific values of the count
outcomes, \theta \in \mathbb{R}^{> 0}
is the dispersion
parameter
which controls the dispersion and level of correlation between the two
samples (otherwise known as the shape parameter of the gamma distribution),
\mu \in \mathbb{R}^{> 0}
is the mean parameter, and
r = \frac{\mu_2}{\mu_1} \in \mathbb{R}^{> 0}
is the ratio
parameter
representing the multiplicative change in the mean of the second sample
relative to the first sample. G
denotes the random subject effect and
the gamma distribution scale parameter is assumed to be the inverse of the
dispersion parameter (\theta^{-1}
) for identifiability.
Correlation decreases from 1 to 0 as the dispersion parameter increases from 0 to infinity. For a given dispersion, increasing means also increases the correlation. See 'Examples' for a demonstration.
See 'Details' in sim_nb()
for additional information on the
negative binomial distribution.
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If return_type = "list"
, a list:
Slot | Name | Description |
1 | Simulated counts from sample 1. | |
2 | Simulated counts from sample 2. |
If return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item | Subject/item indicator. |
2 | condition | Sample/condition indicator. |
3 | value | Simulated counts. |
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in a list-column named data
in
a depower simulation data frame:
Column | Name | Description |
1 | n1 | Sample size of sample 1. |
2 | n2 | Sample size of sample 2. |
3 | mean1 | Mean for sample 1. |
4 | mean2 | Mean for sample 2. |
5 | ratio | Ratio of means (sample 2 / sample 1). |
6 | dispersion | Gamma distribution shape parameter (dispersion). |
7 | nsims | Number of valid simulation iterations. |
8 | distribution | Distribution sampled from. |
9 | data | List-column of simulated data. |
yu_2020depower
\insertRefrettiganti_2012depower
\insertRefaban_2009depower
sim_nb()
, stats::rpois()
, stats::rgamma()
,
stats::rnbinom()
#----------------------------------------------------------------------------
# sim_bnb() examples
#----------------------------------------------------------------------------
library(depower)
# Paired two-sample data returned in a data frame
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 1,
return_type = "data.frame"
)
# Paired two-sample data returned in a list
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 1,
return_type = "list"
)
# Two simulations of paired two-sample data
# returned as a list of data frames
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 2,
return_type = "data.frame"
)
# Two simulations of Paired two-sample data
# returned as a list of lists
sim_bnb(
n = 10,
mean1 = 10,
ratio = 1.6,
dispersion = 3,
nsims = 2,
return_type = "list"
)
#----------------------------------------------------------------------------
# Visualization of the BNB distribution as dispersion varies.
#----------------------------------------------------------------------------
set.seed(1234)
data <- lapply(
X = c(1, 10, 100, 1000),
FUN = function(x) {
d <- sim_bnb(
n = 1000,
mean1 = 10,
ratio = 1.5,
dispersion = x,
nsims = 1,
return_type = "data.frame"
)
cor <- cor(
x = d[d$condition == "1", ]$value,
y = d[d$condition == "2", ]$value
)
cbind(dispersion = x, correlation = cor, d)
}
)
data <- do.call(what = "rbind", args = data)
ggplot2::ggplot(
data = data,
mapping = ggplot2::aes(x = value, fill = condition)
) +
ggplot2::facet_wrap(
facets = ggplot2::vars(.data$dispersion),
ncol = 2,
labeller = ggplot2::labeller(.rows = ggplot2::label_both)
) +
ggplot2::geom_density(alpha = 0.3) +
ggplot2::coord_cartesian(xlim = c(0, 60)) +
ggplot2::geom_text(
mapping = ggplot2::aes(
x = 30,
y = 0.12,
label = paste0("Correlation: ", round(correlation, 2))
),
check_overlap = TRUE
) +
ggplot2::labs(
x = "Value",
y = "Density",
fill = "Condition",
caption = "Mean1=10, Mean2=15, ratio=1.5"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.