In jaydu1/SI: Stochastic Integrating

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The SI package provides several methods of MC Integrating including

Stochastic Point Method
Mean Value Method
Important Sampling Method
Stratified Sampling Method

Note that the Stochastic Point Method is only a stochastic point method to estimate integration. However, it is provided to be easily compared with other MC methods. These four methods are all belong to stochastic methods.

Stochastic Integration

Suppose that $h(x)\in \mathbb{C}^1$ is well-defined and bounded on finite interval $C=[a,b]$. W.L.O.G., $0\leq h(x)\leq M$. We are tried to calculate the integral $$I=\int_a^bh(x)\mathrm{d}x$$ That is to say, we need to calculate the area under $h(x)$: $D={(x,y):0\leq y\leq h(x),\ x\in C}$.

Stochastic Point Method

Uniformly sampling from $G=C\times(0,M)$ for $N$ times and get the stochastic points $Z_1,\cdots,Z_N$ where $Z_i=(X_i,Y_i),\ i=1,2,\cdots,N$.

For $i=1,2,\cdots,N$, let $$\xi_i=\begin{cases} 1&,Z_i\in D\ 0&,\text{otherwise} \end{cases}$$

Then ${\xi_i}$ are outcomes of independent duplicate trials and $\xi_1,\cdots,\xi_N\overset{i.i.d.}{\sim}Bernoulli(1,p)$ where $$p=\mathbb{P}(Z_i\in D)=\dfrac{I}{M(b-a)}$$

Thus, the stochastic point method is given by $$\hat{I}_1=\hat{p}(b-a)$$ where $\hat{p}$ is the estimator of $p$ and given by the proportion of points ${Z_i}$ that lie under the curve $h(x)$.

From Strong Law of Large Number, \begin{align} \hat{p}=\dfrac{\sum\limits_{i=1}^N\xi_i}{N}\xrightarrow{a.s.}p\ \hat{I}=\hat{p}M(b-a)\xrightarrow{a.s.}I \end{align} which means that we can use $\hat{I}$ to approximate $I$ better if we try more times (with large $N$).

To estimate the variance (or precision), we use the Central Limit Theorem and get \begin{align} \dfrac{\hat{p}-p}{Var(p)}&=\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{N}}}\xrightarrow{D}N(0,1)\ \dfrac{\hat{I}_1-I}{\sqrt{\frac{1}{N}}}&\xrightarrow{D}N(0,[M(b-a)]^2p(1-p)) \end{align}

Therefore, the variance can be estimated by $$Var(\hat{I}_1)\approx \dfrac{1}{N}[M(b-a)]^2\hat{p}(1-\hat{p})$$

For $m$-dimension function on $[a_1,b_1]\times \cdots \times[a_m,b_m]$, we also have \begin{align} \hat{I}1&=\hat{p}\prod\limits{i=1}^m(b_i-a_i)\ Var(\hat{I}1)&\approx \dfrac{1}{N}\left[M\prod\limits{i=1}^m(b_i-a_i)\right]^2\hat{p}(1-\hat{p}) \end{align}

In SI package, use the following code to carry out stochastic point method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
SPMresult <- SI.SPM(h,-1,1,exp(1),N)
I1 <- SPMresult[[1]]
VarI1 <- SPMresult[[2]]

Mean Value Method

Let $U\sim Uniform(a,b)$, then \begin{align} \mathbb{E}[h(U)]&=\int_a^bh(x)\dfrac{1}{b-a}\mathrm{d}x\ &=\dfrac{I}{b-a}\ I&=(b-a)\mathbb{E}[h(U)] \end{align}

Suppose that we first sample $U_1,\cdots,U_N\overset{i.i.d.}{\sim}Uniform(a,b)$, then $Y_i=h(U_i),\ i=1,2,\cdots,N$ are i.i.d random variables. From Strong Law of Large Number, $$\overline{Y}=\dfrac{1}{N}\sum\limits_{i=1}^Nh(U_i)\xrightarrow{a.s.}\mathbb{E}[h(U)]=\dfrac{I}{b-a}$$

Thus we get the mean value estimator $$\hat{I}2=\dfrac{b-a}{N}\sum\limits{i=1}^Nh(U_i)$$

To estimate the variance (or precision), we use the Central Limit Theorem and get \begin{align} \dfrac{\hat{I}_2-I}{\sqrt{\frac{1}{N}}}&\xrightarrow{D}N(0,(b-a)^2Var[h(U)])\ \hat{I}_2&\xrightarrow{D}N(I,\dfrac{(b-a)^2}{N}Var[h(U)]) \end{align} where $$Var[h(U)]=\int_a^b{h(u)-\mathbb{E}[h(U)]}^2\dfrac{1}{b-a}\mathrm{d}u$$

Since it can be estiamted by sample variance $$Var[h(U)]\approx \dfrac{1}{N}\sum\limits_{i=1}^N(Y_i-\overline{Y})^2$$ we can estimate the variance of $\hat{I}2$ by $$Var(\hat{I}_2)\approx \dfrac{(b-a)^2}{N^2}\sum\limits{i=1}^N(Y_i-\overline{Y})^2$$

For $m$-dimension function on $[a_1,b_1]\times \cdots \times[a_m,b_m]$, we also have \begin{align} \hat{I}2&=\dfrac{1}{N}\prod\limits{i=1}^m(b_i-a_i)\sum\limits_{i=1}^Nh(U_i)\ Var(\hat{I}2)&\approx \dfrac{1}{N}\left[\prod\limits{i=1}^m(b_i-a_i)\right]^2\sum\limits_{i=1}^N(Y_i-\overline{Y})^2 \end{align}

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
MVMresult <- SI.MVM(h,-1,1,N)
I2 <- MVMresult[[1]]
VarI2 <- MVMresult[[2]]

Important Sampling Method

Suppose $g(x)$ is a density having similar shape with $|h(x)|$, $h(x)=0$ when $g(x)=0$ and $h(x)=o(g(x))$ as $x\rightarrow\infty$.

Suppose that $X_i\overset{i.i.d.}{\sim}g(x),\ i=1,2,\cdots,N$ and let $$\eta_i=\dfrac{h(X_i)}{g(X_i)}$$ then $$\mathrm{E}\eta_i=\int_C\dfrac{h(x)}{g(x)}g(x)\mathrm{d}x=I$$

Therefore, the important sampling estimator is given by $$\hat{I}3=\overline{\eta}=\dfrac{1}{N}\sum\limits{i=1}^N\dfrac{h(X_i)}{g(X_i)}$$

The variance can be estimated by \begin{align} Var(\hat{I}3)&=Var(\overline{\eta})\ &=\dfrac{1}{N}Var\left(\dfrac{h(X)}{g(X)}\right)\ &\approx \dfrac{1}{N^2}\sum\limits{i=1}^N \left[\dfrac{h(X_i)}{g(X_i)}-\hat{I}_3\right]^2 \end{align}

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
## Use the sampling density (3/2+x)/3
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000    
g <- function(x){return((3/2+x)/3)}
G_inv <- function(y){return(sqrt(6*y+1/4)-3/2)}
ISMresult <- SI.ISM(h,g,G_inv,N)
I3 <- ISMresult[[1]]
VarI3 <- ISMresult[[2]]

Stratified Sampling Method

Suppose that $C=\bigcup\limits_{j=1}^m C_j$ and $C_i\cap C_j=\varnothing$. For every $C_j$, use mean value method to approximate $$\hat{I}{C_j}=\int{C_j}h(x)\mathrm{d}x$$

And get $$\hat{I}4=\sum\limits{j=1}^m\hat{I}_{C_j}$$

It can be shown that the varaince of $\hat{I}_4$ is smaller than $\hat{I}_2$.

In SI package, use the following code to carry out mean value method.

## To integrate exp(x) from -1 to 1
set.seed(0)
h <- function(x){
    exp(x)
}
N <- 100000
SSMresult <- SI.SSM(h,-1,1,10,N)
I4 <- SSMresult[[1]]
VarI4 <- SSMresult[[2]]