sim_nb | R Documentation |
Simulate data from two independent negative binomial (NB) distributions. For
paired data, see sim_bnb()
.
sim_nb(
n1,
n2 = n1,
mean1,
mean2,
ratio,
dispersion1,
dispersion2 = dispersion1,
nsims = 1L,
return_type = "list",
max_zeros = 0.99,
ncores = 1L
)
n1 |
(integer: |
n2 |
(integer: |
mean1 |
(numeric: |
mean2 , ratio |
(numeric:
|
dispersion1 |
(numeric: |
dispersion2 |
(numeric: |
nsims |
(Scalar integer: |
return_type |
(string: |
max_zeros |
(Scalar numeric: |
ncores |
(Scalar integer: |
The negative binomial distribution has many parameterizations. In the regression modeling context, it is common to specify the distribution in terms of its mean and dispersion. We use the following probability mass function:
\begin{aligned}
P(X = x) &= \dbinom{x + \theta - 1}{x}
\left( \frac{\theta}{\theta + \mu} \right)^{\theta}
\left( \frac{\mu}{\mu + \theta} \right)^x \\
&= \frac{\Gamma(x + \theta)}{x! \Gamma(\theta)}
\left( \frac{\theta}{\theta + \mu} \right)^{\theta}
\left( \frac{\mu}{\mu + \theta} \right)^{x} \\
&= \frac{\Gamma(x + \theta)}{(\theta + \mu)^{\theta + x}}
\frac{\theta^{\theta}}{\Gamma(\theta)} \frac{\mu^{x}}{x!}
\end{aligned}
where x \in \mathbb{N}^{\geq 0}
, \theta \in \mathbb{R}^{> 0}
is the dispersion parameter, and \mu \in \mathbb{R}^{> 0}
is the mean.
This is analogous to the typical formulation where X
is counting
x
failures given \theta
successes and
p = \frac{\theta}{\theta + \mu}
is the probability of success on each
trial. It follows that E(X) = \mu
and
Var(X) = \mu + \frac{\mu^2}{\theta}
. The \theta
parameter
describes the 'dispersion' among observations. Smaller values of \theta
lead to overdispersion and larger values of \theta
decrease the
overdispersion, eventually converging to the Poisson distribution.
Described above is the 'indirect quadratic parameterization' of the negative
binomial distribution, which is commonly found in the R ecosystem. However, it
is somewhat counterintuitive because the smaller \theta
gets, the larger
the overdispersion. The 'direct quadratic parameterization' of the negative
binomial distribution may be found in some R packages and other languages
such as SAS and Stata. The direct parameterization is defined by substituting
\alpha = \frac{1}{\theta}
(\alpha > 0
) which results in
Var(X) = \mu + \alpha\mu^2
. In this case, the larger \alpha
gets
the larger the overdispersion, and the Poisson distribution is a special case
of the negative binomial distribution where \alpha = 0
.
A general class of negative binomial models may be defined with mean
\mu
and variance \mu + \alpha\mu^{p}
. The 'linear
parameterization' is then found by setting p=1
, resulting in
Var(X) = \mu + \alpha\mu
. It's common to label the linear
parameterization as 'NB1' and the direct quadratic parameterization as 'NB2'.
See 'Details' in sim_bnb()
for additional information on the
gamma-Poisson mixture formulation of the negative binomial distribution.
If nsims = 1
and the number of unique parameter combinations is
one, the following objects are returned:
If return_type = "list"
, a list:
Slot | Name | Description |
1 | Simulated counts from group 1. | |
2 | Simulated counts from group 2. |
If return_type = "data.frame"
, a data frame:
Column | Name | Description |
1 | item | Subject/item indicator. |
2 | condition | Group/condition indicator. |
3 | value | Simulated counts. |
If nsims > 1
or the number of unique parameter combinations is greater than
one, each object described above is returned in a list-column named data
in
a depower simulation data frame:
Column | Name | Description |
1 | n1 | Sample size of group 1. |
2 | n2 | Sample size of group 2. |
3 | mean1 | Mean for group 1. |
4 | mean2 | Mean for group 2. |
5 | ratio | Ratio of means (group 2 / group 1). |
6 | dispersion1 | Dispersion parameter for group 1. |
7 | dispersion2 | Dispersion parameter for group 2. |
8 | nsims | Number of valid simulation iterations. |
9 | distribution | Distribution sampled from. |
10 | data | List-column of simulated data. |
yu_2017depower
\insertRefrettiganti_2012depower
\insertRefaban_2009depower
\insertRefhilbe_2011depower
\insertRefhilbe_2014depower
\insertRefcameron_2013depower
sim_bnb()
, stats::rnbinom()
#----------------------------------------------------------------------------
# sim_nb() examples
#----------------------------------------------------------------------------
library(depower)
# Independent two-sample NB data returned in a data frame
sim_nb(
n1 = 10,
mean1 = 5,
ratio = 1.6,
dispersion1 = 0.5,
dispersion2 = 0.5,
nsims = 1,
return_type = "data.frame"
)
# Independent two-sample NB data returned in a list
sim_nb(
n1 = 10,
mean1 = 5,
ratio = 1.6,
dispersion1 = 0.5,
dispersion2 = 0.5,
nsims = 1,
return_type = "list"
)
# Two simulations of independent two-sample data
# returned as a list of data frames
sim_nb(
n1 = 10,
mean1 = 5,
ratio = 1.6,
dispersion1 = 0.5,
dispersion2 = 0.5,
nsims = 2,
return_type = "data.frame"
)
# Two simulations of independent two-sample data
# returned as a list of lists
sim_nb(
n1 = 10,
mean1 = 5,
ratio = 1.6,
dispersion1 = 0.5,
dispersion2 = 0.5,
nsims = 2,
return_type = "list"
)
#----------------------------------------------------------------------------
# Visualization of the NB distribution as dispersion varies between groups.
#----------------------------------------------------------------------------
disp <- expand.grid(c(1, 10, 100), c(1, 10, 100))
set.seed(1234)
data <- mapply(
FUN = function(disp1, disp2) {
d <- sim_nb(
n1 = 1000,
mean1 = 10,
ratio = 1.5,
dispersion1 = disp1,
dispersion2 = disp2,
nsims = 1,
return_type = "data.frame"
)
cbind(dispersion1 = disp1, dispersion2 = disp2, d)
},
disp1 = disp[[1]],
disp2 = disp[[2]],
SIMPLIFY = FALSE
)
data <- do.call(what = "rbind", args = data)
ggplot2::ggplot(
data = data,
mapping = ggplot2::aes(x = value, fill = condition)
) +
ggplot2::facet_grid(
rows = ggplot2::vars(.data$dispersion2),
cols = ggplot2::vars(.data$dispersion1),
labeller = ggplot2::labeller(
.rows = ggplot2::label_both,
.cols = ggplot2::label_both
)
) +
ggplot2::geom_density(alpha = 0.3) +
ggplot2::coord_cartesian(xlim = c(0, 50)) +
ggplot2::labs(
x = "Value",
y = "Density",
fill = "Condition",
caption = "Mean1=10, Mean2=15, ratio=1.5"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.