Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The multivariate negative binomial distribution.

Recall that a random variable $X$ has (univariate) negative binomial law with parameters $\kappa>0, 0<p<1$, i.e., $X\sim \text{NB}(\kappa, p)$ if its probability mass function is given by $$ P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in {0,1,\ldots}. $$

A random vector ${\bf X}=(X_1,\dots,X_n)'$ is said to follow the multivariate negative binomial distribution with parameters $\kappa, p_1, \dots, p_n$ if its probability mass function is given by $$ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa}, $$ where, for $i=1,\dots,n$, $x_i\in{0,1,\dots}$, $00$.

We note that the function stats::rnbinom can be used to simulate from the univariate negative binomial distribution. The trawl package introduces the function Bivariate_NBsim which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate $X_1$ from the univariate negative binomial distribution NB($\kappa$,$p_1/(1-p_2)$). Second, we simulate $X_2|X_1=x_1$ from the univariate negative binomial distribution NB($\kappa+x_1,p_2$), see for instance [@Dunn1967].

Example

set.seed(1)
kappa<- 3
p1 <- 0.1
p2 <- 0.85
p <- p1+p2
p0 <-1-p1-p2

N<- 10000

#Simulate from the bivariate negative binomial distribution
y <- trawl::Bivariate_NBsim(N,kappa,p1,p2)

#Compare the empirical and theoretical mean of the first component
base::mean(y[,1])
kappa*p1/(1-p)

#Compare the empirical and theoretical variance of the first component
stats::var(y[,1])
kappa*p1*(1-p2)/(1-p)^2

#Compare the empirical and theoretical mean of the second component
base::mean(y[,2])
kappa*p2/(1-p)

#Compare the empirical and theoretical variance of the second component
stats::var(y[,2])
kappa*p2*(1-p1)/(1-p)^2

#Compare the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
(p1*p2/(p0+p1)*(p0+p2))^(1/2)

The multivariate logarithmic series distribution

We say that a vector ${\bf X}=(X_1,\dots,X_n)'$ follows the multivariate logarithmic series distribution (LSD), see, e.g., [@GB1967]. ${\bf X} \sim \text{LSD}(p_1,\ldots,p_n)$, where $0<p_i<1, p:=\sum_{i=1}^np_i <1$ if for ${\bf x}\in \mathbb{N}_0^n \setminus {{\bf 0} }$, if its probability mass function is given by $$ P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{{-\ln(1-p)}}. $$ Note that each component $X_i$ follows the modified univariate logarithmic distribution with parameters $\tilde p_i = p_i/(1-p+p_i)$ and $\delta_i= \ln(1-p+p_i)/\ln(1-p)$, i.e., $X_i\sim\text{ModLSD}(\tilde p_i, \delta_i)$ with $$ P(X_i =x_i) = \left{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\ (1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{{-\ln(1-\tilde p_i)}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right. $$ Simulations from the univariate LSD can be carried out using the function Runuran::urlogarithmic. The trawl package implements the functions Bivariate_LSDsim and Trivariate_LSDsim to simulate from both the bivariate and the trivariate logarithmic series distribution.

Simulating from the bivariate logarithmic series distribution

The function Bivariate_NBsim can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector ${\bf X}=(X_1,X_2)'$ following the bivariate logarithmic series distribution with parameters $00$.

The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2)$ and $\delta_1=\ln(1-p_2)/\ln(1-p)$. Then we simulate $X_2$ conditional on $X_1$. We note that $X_2|X_1=x_1$ follows the logarithmic series distribution with parameter $p_2$ when $x_1=0$, and the negative binomial distribution with parameters $(x_1,p_2)$ when $x_1>0$.

Example

Next we provide an example of a simulation from the bivariate LSD and we showcase the functions ModLSD_Mean, ModLSD_Var, BivLSD_Cor and BivLSD_Cov which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.

set.seed(1)
p1<-0.15
p2<-0.3

N<-10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Bivariate_LSDsim(N, p1, p2)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))

##Compute the empirical and theoretical correlation between the two components
stats::cor(y[,1],y[,2])
trawl::BivLSD_Cor(p1,p2)

##Compute the empirical and theoretical covariance between the two components
stats::cov(y[,1],y[,2])
trawl::BivLSD_Cov(p1,p2)

Simulating from the trivariate logarithmic series distribution

The function Trivariate_NBsim can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First, $X_1$ is simulated from the modified logarithmic distribution with parameters $\tilde p_1=p_1/(1-p_2-p_3)$ and $\delta_1=\ln(1-p_2-p_3)/\ln(1-p)$. Then we simulate $(X_2,X_3)'$ conditional on $X_1$. We note that $(X_2,X_3)'|X_1=x_1$ follows the bivariate logarithmic series distribution with paramaters $(p_2,p_3)$ when $x_1=0$, and the bivariate negative binomial distribution with parameters $(x_1,p_2,p_3)$ when $x_1>0$.

Example

set.seed(1)
p1<-0.15
p2<-0.25
p3<-0.55

N<- 10000

#Simulate N realisations from the bivariate LSD 
y<-trawl::Trivariate_LSDsim(N, p1, p2, p3)

#Compute the empirical and theoretical mean of the first component
base::mean(y[,1])
trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))

#Compute the empirical and theoretical mean of the second component
base::mean(y[,2])
trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))

#Compute the empirical and theoretical mean of the third component
base::mean(y[,3])
trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))

#Compute the empirical and theoretical variance of the first component
stats::var(y[,1])
trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))

#Compute the empirical and theoretical variance of the second component
stats::var(y[,2])
trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))

#Compute the empirical and theoretical variance of the third component
stats::var(y[,3])
trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))

#Computing the bivariate covariances and correlations
#Cor(X1,X2):
delta <- base::log(1-p3)/base::log(1-p1-p2-p3)
hatp1 <-p1/(1-p3)
hatp2<-p2/(1-p3)

stats::cov(y[,1],y[,2])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)

stats::cor(y[,1],y[,2])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)

#Cor(X1,X3):
delta <- log(1-p2)/log(1-p1-p2-p3)
hatp1 <-p1/(1-p2)
hatp2<-p3/(1-p2)

stats::cov(y[,1],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)

stats::cor(y[,1],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)

#Cor(X2,X3):
delta <- log(1-p1)/log(1-p1-p2-p3)
hatp1 <-p2/(1-p1)
hatp2<-p3/(1-p1)

stats::cov(y[,2],y[,3])
trawl::BivModLSD_Cov(delta,hatp1,hatp2)
stats::cor(y[,2],y[,3])
trawl::BivModLSD_Cor(delta,hatp1,hatp2)

References



Try the trawl package in your browser

Any scripts or data that you put into this service are public.

trawl documentation built on Feb. 23, 2021, 1:06 a.m.