set.seed(1)
library(StatComp18053)
options(warn =-1)

Question

Write a .Rmd file to implement at least three examples of different types in the books that are in references(texts, numerical results, tables, and figures)

Solution

Example 1 Matrix computation

Function rbind() and cbind() combind matrices or vectors from up and down or left and right respectively.

m1 <- matrix(1, nr = 2, nc = 2)
m2 <- matrix(2, nr = 2, nc = 2) 
rbind(m1, m2)
cbind(m1, m2)

Example 2 Performing Simple Linear Regression

X and Y, that hold paired observations: $(x_{1}, y_{1}), (x_{2}, y_{2}), ..., (x_{n}, y_{n})$. There is a linear relationship between x and y, and create a regression model of the relationship.

x<-c(1:10)
y<-c(18,16,10,17,16,14,11,16,11,15)
dfrm<-data.frame(x,y)
yhat<-lm(y ~ x, data=dfrm)
summary(yhat)

Question 3.5

A discrete random variable $X$ has probability mass function

pander::pandoc.table(matrix(c(0,0.1,1,0.2,2,0.2,3,0.2,4,0.3),byrow = FALSE,ncol = 5),style='multiline',row.names = c('x','p(x)'))#draw the table embedded in exercise 3.5

Use the inverse transform method to generate a random sample of size 1000 from the distribution of $X$. Construct a relative frequency table and compare the empirical with the theoretical probabilities. Repeat using the R sample function.

Answer

First, $X$ is a discrete random variable which takes values$\cdot\cdot\cdot<x_{i-1}<x_{i}<x_{i+1}<\cdot\cdot\cdot$. Then its inverse transformation will be $F_{X}^{-1}(u)=x_{i}$ for $F_{X}(x_{i-1})<u$$\leq$$F_{X}(x_{i})$ So, we can generate $U$~$U(0,1)$ first(For repeatable studies, I set up random seeds), and if $F_{X}(x_{i-1})<u$$\leq$$F_{X}(x_{i})$,then we can take $X=x_{i}$. And the way to do that is to find interval that each random number belongs to and find corresponding $x_{i}$. Then we can calculate the number of every $x_{i}$ occurs and their frequency. The result(the empirical probabilities) is shown below. And there is a sentence that is Repeat using the R sample function in the question description, so I use sample function to draw samples from generated random numbers in my code.

x <- c(0,1,2,3,4)
px <- c(1,2,2,2,3)/10#logging data
cp <- cumsum(px)#cdf of px
m <- 1e3# the number of random numbers
r <- numeric(m) #creat numeric object
r <- x[findInterval(sample(runif(m),size = 1e4,replace = TRUE),cp)+1]#finding interval that each random number belongs to and finding corresponding xi 
ct <- as.vector(table(r))#calculate the number of every xi occurs
ct/sum(ct)# and their frequency
knitr::kable(data.frame(ct/sum(ct),px,(px-ct/sum(ct))/px),align = 'c',col.names = c('ep','tp','(tp-ep)/tp'))#Generate a comparison table

Now let we compare the empirical with the theoretical probabilities, which are shown in the table above. The ep represents the empirical probabilities, and the tp is on behalf of the theoretical probabilities. The last column is the relative error. We can see that the two are very closed.
The simulating result suggests that the inverse transform method is quite efficient.

Question 3.7

Write a function to generate a random sample of size n from the $Beta(a,b)$ distribution by the acceptance-rejection method. Generate a random sample of size 1000 from the $Beta(3,2)$ distribution. Graph the histogram of the sample with the theoretical $Beta(3,2)$ density superimposed.

Answer

At first, choose uniform distribution from 0 to 1 as the envelope distribution. According to the basic idea of the acceptance-rejection method and the pdf of $Beta$ distribution, we accept random number when $\rho(Y)=Y^{a-1} * (1-Y)^{b-1} > u$, in which we choose paramter c as $1/Beta(a,b)$ for convenience. In my function code, I set three input parameters, which are a, b and n. If we show all generated numbers, it will take too much space. So we just take a casual look at the first 100 numbers of generated random numbers, which is listed below.

grnf<-function(a,b,n){
j<-k<-0#Initialize the loop variable
y <- numeric(n)#A variable that stores random Numbers
while (k < n) { 
  u <- runif(1) 
  j <- j + 1 
  x <- runif(1) #random variate from g
  if (x^(a-1) * (1-x)^(b-1) > u) {
    #we accept x
    k <- k + 1 
    y[k] <- x 
  } 
} 
return(y)#Return the accepted random number
}
grn<-grnf(3,2,1000)#generate random numbers
grn[1:100]#show first 100 num of rn

y=rbeta(1000,3,2)
hist(grn, main="Beta distribution generation",xlab="grn",ylab='density',col="darkblue",freq = FALSE)#plot the histogram of the sample 
curve(dbeta(x,3,2), add = TRUE, col = 'red', lwd = 2)# plot theoretical density curve

From the picture, we can see that the distribution of generated numbers is roughly the same as the theoretical Beta(3,2) distribution.

Question 3.12

Simulate a continuous Exponential-Gamma mixture. Suppose that the rate parameter $Λ$ has Gamma($r$,$β$) distribution and $Y$ has $Exp(Λ)$ distribution. That is, $(Y|Λ=λ)$~$f_{Y}(y|λ)=λe^{-λy}$. Generate 1000 random observations from this mixture with $r = 4$ and $β = 2$.

Answer

The specific work to faciliate the simulation of the mixture distribution limited by exponential and gamma distribution is to generate random numbers from gamma distribution first and then generate random numbers from mixture according to above numbers. Also, I show the first hundred Numbers. In order to understand what the mixed distribution is like, I draw the probability density map according to generated random numbers. How elegant it is!

n<-1000
r<-4
beta<-2
lambda<-rgamma(n,r,beta)#Generate the rate parameter
y<-rexp(n,lambda)#Generate observations from mixture
y[1:100]
plot(density(y), main = "Exponential-Gamma distribution", xlab = "Y", ylab = "density", col = "darkblue", lwd=2)#plot probability density from  generated random numbers

Question 5.1

Compute a Monte Carlo estimate of $$\int_{0}^{\frac{\pi}{3}}sint\ dt$$ and compare your estimate with the exact value of the integral.

Answer

We can use a simple method to faciliate the computation of the integration above by Monte Carlo Method. Then we can choose $\frac{\pi}{3}sinx$ as g(x), and use a frequency to approximation the expectation of$\frac{1}{m}\sum_{i=1}^{m}g(X_{i})$, which is the integration above.

m<-1e4
x <- runif(m,min=0,max=pi/3)#produces a uniform distribution from 0 to pi/3
theta.hat<-mean(sin(x)*pi/3)#approximation the integration
print(c(theta.hat,0.5))#compare approximation to real value

we can see that the two values are closed. That means the simulation is effective.

Question 5.4

Write a function to compute a Monte Carlo estimate of the Beta(3, 3) cdf,and use the function to estimate F(x) for x = 0.1, 0.2, . . . , 0.9. Compare the estimates with the values returned by the pbeta function in R.

Answer

We know that the Beta cdf is $\int_{0}^{x}\frac{1}{B(\alpha,\beta)}t^{\alpha-1}(1-t)^{\beta-1}dt$. Before we write a funtion, we can consider x fixed, and then the next are the same as question 5.1.

ebc<-function(a,b,x){
  m<-1e4
  t <- runif(m,min=0,max=x)#produces a uniform distribution from 0 to x
  theta.hat<-mean(x*1/beta(a,b)*t^(a-1)*(1-t)^(b-1))#approximation the integration
}
for (i in seq(0.1,0.9,0.1)) {
  print(ebc(3,3,i))#output distribution values from simulating function
}
print(pbeta(seq(0.1,0.9,0.1),3,3))#output distribution values from real function

We can see that the two vectors are nearly the same, so the simulation is a success.

Question 5.9

The Rayleigh density [156, (18.76)] is $$f(x)=\frac{x}{\sigma^{2}}e^{\frac{-x^{2}}{2\sigma^{2}}},\qquad x\geq0,\sigma>0$$. Implement a function to generate samples from a Rayleigh($\sigma$) distribution,using antithetic variables. What is the percent reduction in variance of $\frac{X+X'}{2}$compared with $\frac{X1+X2}{2}$ for independent $X_{1}, X_{2}$?

Answer

Of course, the antithetic variables will be $Y=g(U)$ and $Y'=g(1-U)$, which are generatlly negatively correlated.The sampling process is implemented in the function MC.Phi below. Optionally, MC.Phi will compute the estimate with or without antithetic sampling, which is convenient for next steps.

In order to verify whether the samples are drawed from a Rayleigh($\sigma$) distribution, we can set $\sigma$ as 1, and compare the distribution of samples and the real distribution. From the histogram, we can sure that the samples are indeed drawed from Rayleigh($\sigma$) distribution.

#function to generate samples from a Rayleigh($\sigma$) distribution
MC.phi<-function(sigma,R=10000,antithetic=TRUE){
  u<-runif(R/2)
  if(!antithetic){
    v <- runif(R/2) 
  }
  else v <- 1 - u 
  u <- c(u, v) 
  g<- sqrt(-2*sigma^2*log(1-u))#inverse transformation method
  return(g)
}

#compare the distribution of samples and real distribution
sigma<-1 
g<-MC.phi(sigma)#generate random numbers from Rayleigh(sigma) distribution
hist(g,prob=TRUE,main = expression(f(x)==(x/sigma^2)*exp(-x^2/(2*sigma^2))))#plot histogram of g
y<-seq(0,4,0.01)
lines(y,y/sigma^2*exp(-y^2/2/sigma^2)) #add probability density curve

#the percent reduction from simulation with sigma=1
MC1<-MC.phi(sigma,antithetic = FALSE)
MC2<-MC.phi(sigma)
c(var(MC1),var(MC2),(var(MC1)-var(MC2))/var(MC1))

About the percent reduction in variance of $\frac{X+X'}{2}$compared with $\frac{X_{1}+X_{2}}{2}$ for independent $X_{1}, X_{2}$, we can study this from numerical simulation.The approximate reduction in variance can be estimated by a simulation under both methods, the simple Monte Carlo integration(MC1) approach, which is realized by set parameter antithetic=FALSE in the code above, and the antithetic variable approach(MC2). We can see that the variance of samples from two methods are almost the same.

Question 5.13

Find two importance functions $f_{1}$ and $f_{2}$ that are supported on $(1,\infty)$ and are close? to $$g(x)=\frac{x^{2}}{\sqrt{2\pi}}e^{\frac{-x^{2}}{2}},\qquad x>1.$$ Which of your two importance functions should produce the smaller variance in estimating$$\int_{1}^{\infty}\frac{x^{2}}{\sqrt{2\pi}}e^{\frac{-x^{2}}{2}}dx$$by importance sampling?Explain.

Answer

Conveniently, we can choose $f_{1}(x)=\frac{1}{2}x^{2}e^{-x},\quad x>0$, which is probability density function of $\Gamma(3,1)$ and $f_{2}(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}},\quad -\infty0$, which is Weibull distribution with shape parameter equals to 2 and scale parameter equals to $\sqrt2$. $f_{2}(x)$ has a larger range and many of the simulated values will contribute zeros to the sum, which is inefficient. So, $f_{1}(x)$and $f_{3}(x)$should produce the smaller variance intuitively.

x <- seq(1, 10, .01)#produce a sequence from 1 to 10
w <- 2 
#compute values of all functions and plot them
f1 <- 1/2*x^2*exp(-x) 
f2 <- 1/sqrt(2*pi)*exp(-x^2/2)
f3<-x*exp(-x^2/2)
g<-x^2/sqrt(2*pi)*exp(-x^2/2)
plot(x, g, type = "l", main = "f(x)s with g(x) (lines 1:3)", ylab = "", lwd = w) 
lines(x, f1, lty = 2, lwd = w) 
lines(x, f2, lty = 3, lwd = w)
lines(x, f3, lty = 4, lwd = w)
legend("topright", legend = c("g", 1:3), lty = 1:4, lwd = w, inset = 0.02)
#plot ratios g(x)/f(x)
plot(x, g/f1, type = "l", main = "the ratios g(x)/f(x)", ylab = "", ylim = c(0,3), lwd = w, lty = 2)
lines(x, g/f2, lty = 3, lwd = w)
lines(x, g/f3, lty = 4, lwd = w)
legend("topright", legend = c(1:3), lty = 2:4, lwd = w, inset = 0.02)

From pitures above, we can see that the function that corresponds to the most nearly constant ratio g(x)/f(x) appears to be $f_{1}$. From the graphs, we might prefer $f_{1}$ for the smallest variance.

g<-function(x){x^2/sqrt(2*pi)*exp(-x^2/2)*(x>1)}
m <- 10000 #numbers of random numbers generated
theta.hat <- se <- numeric(3)
#using f1
x<-rweibull(m,2,sqrt(2))
fg<-g(x)/(x*exp(-x^2/2))
theta.hat[1] <- mean(fg) #compute estimation value
se[1] <- sd(fg) #comput standard error of estimation value
#using f2
x<-rnorm(m,0,1)
fg<-g(x)/(1/sqrt(2*pi)*exp(-x^2/2))
theta.hat[2]<-mean(fg)
se[2]<-sd(fg)
#using f3
x<-rgamma(m,3,scale = 1)
fg<-g(x)/(x^2/2*exp(-x))
theta.hat[3]<-mean(fg)
se[3]<-sd(fg)
rbind(theta.hat, se) 

So the simulation indicates that $f_{1}$ and $f_{3}$ produce smallest variance among these three importance functions, while $f_{2}$ produces the highest variance. And the same as above, $f_{1}$ is the best among these three importance functions.

Question 5.14

Obtain a Monte Carlo estimate of$$\int_{1}^{\infty}\frac{x^{2}}{\sqrt{2\pi}}e^{\frac{-x^{2}}{2}}dx$$ by importance sampling.

Answer

Acturally, I have already finished this work in question 5.13. As described in question 5.13, we can choose $f_{1}(x)$ as the importance function. Let we do this again.

m<-1e4
x<-rweibull(m,2,sqrt(2))
fg<-g(x)/(x*exp(-x^2/2))
theta.hat <- mean(fg)
se <- sd(fg)
cbind(theta.hat, se)

The result show that the Monte Carlo estimation is 0.4023378, and se of it is 0.3588117.

Question 6.9

Let X be a non-negative random variable with $\mu = E[X] < \infty$. For a random sample $x_{1}, \cdots, x_{n}$ from the distribution of X, the Gini ratio is defined by$$G=\frac{1}{2n^{2}\mu}\sum_{j=1}^{n}\sum_{i=1}^{n}|x_{i}-x_{j}|.$$ The Gini ratio is applied in economics to measure inequality in income distribution (see e.g. [163]). Note that G can be written in terms of the order statistics $x_{(i)}$ as$$G=\frac{1}{n^{2}\mu}\sum_{i=1}^{n}(2i-n-1)x_{(i)}.$$ If the mean is unknown, let $\hat{G}$ be the statistic G with $\mu$ replaced by $\bar{x}$. Estimate by simulation the mean, median and deciles of $\hat{G}$ if X is standard lognormal.Repeat the procedure for the uniform distribution and Bernoulli(0.1). Also construct density histograms of the replicates in each case.

Answer

In order to insert my explanation into code, I split the code and that cause many duplicates of the three parts of code. However, I find that there is nothing to say (smile to cry), but I'm going to explain forcibly (smile to cry again).

m<-1e3
G1<-numeric(m)
vec<-2*c(1:m)-m-1
for(i in 1:m){
  x<-rlnorm(m,0,1) #generate random numbers
  xs<-sort(x) #sorting x
  mu=mean(x)
  G1[i]<-1/m^2/mu*(t(vec)%*%xs) #compute G
}
#compute mean,median,deciles and plot histogram of G1
mean(G1)
median(G1)
quantile(G1,probs = seq(0.1,0.9,0.1))
hist(G1,probability = TRUE)

Like homeworks before, we only need to generate random numbers and then compute statistic according to the formula. The rest is to compute 'plug-in' estimation of the statistic. It's so easy! From the result, we can see that mean and median are nearly the same and the data are distributed densely according to quantile and the histogram.

#repeat for uniform distribution
m<-1e3
G2<-numeric(m)
vec<-2*c(1:m)-m-1
for(i in 1:m){
  x<-runif(m,0,1)
  xs<-sort(x)
  mu=mean(x)
  G2[i]<-1/m^2/mu*(t(vec)%*%xs)
}
#compute mean,median,deciles and plot histogram of G2
mean(G2)
median(G2)
quantile(G2,probs = seq(0.1,0.9,0.1))
hist(G2,probability = TRUE)
#repeat for Bernoulli(0.1)
m<-1e3
G3<-numeric(m)
vec<-2*c(1:m)-m-1
for(i in 1:m){
  x<-rbinom(m,size=1,prob=0.1)
  xs<-sort(x)
  mu=mean(x)
  G3[i]<-1/m^2/mu*(t(vec)%*%xs)
}
#compute mean,median,deciles and plot histogram of G3
mean(G3)
median(G3)
quantile(G3,probs = seq(0.1,0.9,0.1))
hist(G3,probability = TRUE)

From the result, we can see that results of the repeat for uniform distribution and Bernoulli(0.1) have the same properties as lognormaal distribution. The difference is that G1 is around 0.5, G2 is around 0.3 and G3 is around 0.9.

Question 6.10

Construct an approximate 95% confidence interval for the Gini ratio $\gamma = E[G]$ if X is lognormal with unknown parameters. Assess the coverage rate of the estimation procedure with a Monte Carlo experiment.

Answer

Acoording to the central limit theorem, we know that $$\frac{\frac{1}{n}\sum_{i=1}^{n}G_{i}-\gamma}{\sqrt{\frac{\sigma^{2}}{n}}}\sim N(0,1),\quad n\rightarrow \infty.$$ Where, $\sigma$ is the varaiance of G. So, we can deriving the confidence interval from the distribution above, which is $\frac{1}{n}\sum_{i=1}^{n}G_{i}\pm Z_{\frac{\alpha}{2}}\sqrt{\frac{\sigma^{2}}{n}}$. In order to assess the coverage rate, we can define the distribution of X as the standard lognormal, which is also in exercise 6.9, and simulate with a Monte Carlo experiment. The result is shown as below.

m<-1e3
G<-numeric(m)
vec<-2*c(1:m)-m-1
for(i in 1:m){
  x<-rlnorm(m,0,1) #generate random numbers
  xs<-sort(x) #sorting x
  mu=mean(x)
  G[i]<-1/m^2/mu*(t(vec)%*%xs) #compute G
}
CI<-c(mean(G)-1.96*sqrt(var(G)/m),mean(G)+1.96*sqrt(var(G)/m))#compute confidence interval
CI
#approximate Coverage probability(ECP) of confidence interval
N<-100
ECI<-matrix(NA,N,2)
k<-0
for(j in 1:N){
  for(i in 1:m){
    x<-rlnorm(m,0,1) 
    xs<-sort(x) 
    mu<-mean(x)
    G[i]<-1/m^2/mu*(t(vec)%*%xs)
  }
  ECI[j,]<-c(mean(G)-1.96*sqrt(var(G)/m),mean(G)+1.96*sqrt(var(G)/m))
}
ECP<-mean(ECI[,1]<=0.52 & ECI[,2]>=0.52) #I replaced the true value of E(G) with the value of 0.52 in 6.9. Although there was a deviation, it was extremely small. And it's too hard to calculate the E(G).
ECP

The result seems not too bad. In the code above, I set the repeat times N as 100, when it gets larger, the running time will be higher. Maybe it is because I used a double loop. In my mind, a double loop is inefficient. But I can't find a better way. In other words, I don't know how to vectorize in R.

Question 6.B

Tests for association based on Pearson product moment correlation $\rho$, Spearman??s rank correlation coefficient $\rho_{s}$, or Kendall??s coefficient $\tau$, are implemented in cor.test. Show (empirically) that the nonparametric tests based on $\rho_{s}$ or $\tau$ are less powerful than the correlation test when the sampled distribution is bivariate normal. Find an example of an alternative (a bivariate distribution (X, Y) such that X and Y are dependent) such that at least one of the nonparametric tests have better empirical power than the correlation test against this alternative.

Answer

We can use the bivariate normal distribution of the marginal distribution as the standard normal distribution for empirical analysis.

m<-1e3
sigma<-matrix(c(1,0,0,1),2,2)#set correlation is 0,because the null hypothesis of the test is that the correlation coefficient is 0. Of course, you can set a number which is very very close to 0.
mu<-c(0,0)
pcorp<-pcors<-pcork<-numeric(m)
for(i in 1:m){
  data<-MASS::mvrnorm(m,Sigma = sigma,mu=mu)#generate bivariate normal random numbers
  x<-data[,1]
  y<-data[,2]
  #take out the p-value of the three test methods 
  pcorp[i]<-cor.test(x,y,method = "pearson")$p.value
  pcork[i]<-cor.test(x,y,method = "kendall")$p.value
  pcors[i]<-cor.test(x,y,method ="spearman")$p.value
}
print(c(mean(pcorp<=0.05),mean(pcork<=0.05),mean(pcors<=0.05)))#show power of three methods

The results show that the nonparametric tests based on $\rho_{s}$ or $\tau$ are less powerful than the correlation test when the sampled distribution is bivariate normal.

In order to find an example of an alternative (a bivariate distribution (X, Y) such that X and Y are dependent) such that at least one of the nonparametric tests have better empirical power than the correlation test against this alternative, we just need to let correlation away from zero slightly.

m<-1e3
sigma<-matrix(c(1,0.15,0.15,1),2,2)#set correlation is 0.15
mu<-c(0,0)
pcorp<-pcors<-pcork<-numeric(m)
for(i in 1:m){
  data<-MASS::mvrnorm(m,Sigma = sigma,mu=mu)#generate bivariate normal random numbers
  x<-data[,1]
  y<-data[,2]
  #take out the p-value of the three test methods 
  pcorp[i]<-cor.test(x,y,method = "pearson")$p.value
  pcork[i]<-cor.test(x,y,method = "kendall")$p.value
  pcors[i]<-cor.test(x,y,method ="spearman")$p.value
}
print(c(mean(pcorp<=0.05),mean(pcork<=0.05),mean(pcors<=0.05)))#show power of three methods

We can see that both of the nonparametric tests have better empirical power than the correlation test against this alternative.

Question 7.1

Compute a jackknife estimate of the bias and the standard error of the correlation statistic in Example 7.2

Answer

d<-bootstrap::law
n<-dim(d)[1]
b.cor<-function(x,i) cor(x[i,1],x[i,2]) #function to compute correlation
theta.hat<-b.cor(d,1:n)
theta.jack<-numeric(n)
# generate estimation values of theta by jackknife
for(i in 1:n){
  theta.jack[i]<-b.cor(d,(1:n)[-i])
}
bias.jack<-(n-1)*(mean(theta.jack-theta.hat)) #estimate bias
se.jack<-sqrt((n-1)*mean((theta.jack-theta.hat)^2)) #estimate se
round(c(original=theta.hat,bias=bias.jack,se=se.jack),3)

Question 7.5

Refer to Exercise 7.4. Compute 95% bootstrap confidence intervals for the mean time between failures 1/λ by the standard normal, basic, percentile, and BCa methods. Compare the intervals and explain why they may differ.

Answer

ci.norm<-ci.basic<-ci.perc<-ci.bca<-matrix(NA,1,2)
b.mean<-function(x,i) mean(x[i]) #function to compute mean
d<-as.matrix(boot::aircondit)
# generate estimation values of theta by bootstrap
obj<-boot::boot(data = d,statistic = b.mean,R=100) 
ci<-boot::boot.ci(obj,type = c("norm","basic","perc","bca")) #compute confidence interval
ci.norm<-ci$norm[2:3]
ci.basic<-ci$basic[4:5]
ci.perc<-ci$percent[4:5]
ci.bca<-ci$bca[4:5]
cat("norm:",ci.norm,"basic:",ci.basic,"perc:",ci.perc,"Bca:",ci.bca)

I think these confidence inteval differs mainly because the method of constructing confidence intervals(the assumed distribution is different) and the angle of considering problem are different. Of course, the calculation error may have some effect.

Question 7.8

Refer to Exercise 7.7. Obtain the jackknife estimates of bias and standard error of $\hat{\theta}$.

Answer

d<-bootstrap::scor
n<-dim(d)[1]
e.hat<-eigen(cov(d))$values #compute eigen values
theta.hat<-e.hat[1]/sum(e.hat)
theta.jack<-numeric(n)
e.jack<-matrix(NA,n,5)
# generate estimation values of theta by jackknife
for(i in 1:n){
  e.jack[i,]<-eigen(cov(d[-i,]))$values
  theta.jack[i]<-e.jack[i,1]/sum(e.jack[i,])
}
bias.jack<-(n-1)*(mean(theta.jack-theta.hat)) #estimate bias
se.jack<-sqrt((n-1)*mean((theta.jack-theta.hat)^2)) #estimate se
round(c(original=theta.hat,bias=bias.jack,se=se.jack),3)

Question 7.11

In Example 7.18, leave-one-out (n-fold) cross validation was used to select the best fitting model. Use leave-two-out cross validation to compare the models.

Answer

d<-DAAG::ironslag
n <- length(d$magnetic)
s<-combn(1:n,2)#List all the combinations of two samples are leaved out
#Be careful here about the dangers of assigning different variables together
e1 <-matrix(NA,2,n*(n-1)/2)
e2 <-matrix(NA,2,n*(n-1)/2)
e3 <-matrix(NA,2,n*(n-1)/2)
e4 <- matrix(NA,2,n*(n-1)/2)
# for n-fold cross validation
# fit models on leave-two-out samples 
for(k in 1:(n*(n-1)/2)){
  y<- d$magnetic[-s[,k]]
  x<- d$chemical[-s[,k]]

  J1 <- lm(y ~ x) #fitting model
  yhat1 <- J1$coef[1] + J1$coef[2] * d$chemical[s[,k]] #compute prediction values
  e1[,k] <- d$magnetic[s[,k]] - yhat1 #compute residual errors

  J2 <- lm(y ~ x + I(x^2)) 
  yhat2 <- J2$coef[1] + J2$coef[2] * d$chemical[s[,k]] + J2$coef[3] * d$chemical[s[,k]]^2 
  e2[,k] <- d$magnetic[s[,k]] - yhat2

  J3 <- lm(log(y) ~ x) 
  logyhat3 <- J3$coef[1] + J3$coef[2] * d$chemical[s[,k]] 
  yhat3 <- exp(logyhat3) 
  e3[,k] <- d$magnetic[s[,k]] - yhat3

  J4 <- lm(log(y) ~ log(x)) 
  logyhat4 <- J4$coef[1] + J4$coef[2] * log(d$chemical[s[,k]]) 
  yhat4 <- exp(logyhat4) 
  e4[,k] <- d$magnetic[s[,k]] - yhat4
}
cat("J1:",mean(e1^2),"J2:",mean(e2^2),"J3:",mean(e3^2),"J4:",mean(e4^2))#compute the mean of the squared prediction errors

#plot model 2
a <- seq(10, 40, .1)
L2 <- lm(d$magnetic ~ d$chemical +I((d$chemical)^2))
plot(d$chemical, d$magnetic, main="Quadratic", pch=16)
yhat <- L2$coef[1] + L2$coef[2] * a + L2$coef[3] * a^2 
lines(a, yhat, lwd=2)
plot(L2)

According to the prediction error criterion (J2 has the lowest mean of square error 17.87018), Model 2, the quadratic model, would be still the best fit for the data.
The pictures above show the plot of the predicted response with the data and the residual plots for Model 2.
In the quadratic model the predictors X and $X^2$ are highly correlated.

Question 8.1

Implement the two-sample Cramer-von Mises test for equal distributions as a permutation test. Apply the test to the data in Examples 8.1 and 8.2.

Answer

attach(chickwts) 
x <- sort(as.vector(weight[feed == "soybean"])) 
y <- sort(as.vector(weight[feed == "linseed"])) 
detach(chickwts)
R<-999
z<-c(x,y)
K<-length(z)
n<-length(x)
m<-K-n


Fnx0<-numeric(n)
Gmx0<-numeric(n)
Fny0<-numeric(m)
Gmy0<-numeric(m)
for(j in 1:n){
  Fnx0[j]<-mean(x<=x[j])
  Gmx0[j]<-mean(y<=x[j])
}
for(k in 1:m){
  Fny0[k]<-mean(x<=y[k])
  Gmy0[k]<-mean(y<=y[k])
}
theta_hat<-m*n/(m+n)^2*(t(Fnx0-Gmx0)%*%(Fnx0-Gmx0)+t(Fny0-Gmy0)%*%(Fny0-Gmy0))

W2<-numeric(R)
for(i in 1:R){
  k<-sample(K,size = n,replace = FALSE)
  x1<-z[k]
  y1<-z[-k]
  Fnx<-numeric(n)
  Gmx<-numeric(n)
  Fny<-numeric(m)
  Gmy<-numeric(m)
  for(j in 1:n){
    Fnx[j]<-mean(x1<=x1[j])
    Gmx[j]<-mean(y1<=x1[j])
  }
  for(k in 1:m){
    Fny[k]<-mean(x1<=y1[k])
    Gmy[k]<-mean(y1<=y1[k])
  }
  W2[i]<-m*n/(m+n)^2*(t(Fnx-Gmx)%*%(Fnx-Gmx)+t(Fny-Gmy)%*%(Fny-Gmy))
}

hist(W2,probability = TRUE,xlab = "W2 (p=0.421)",breaks = "scott")
points(theta_hat,0,cex=1,pch=16)

p_value<-mean(c(theta_hat[1],W2)>=theta_hat[1])
p_value

Comparing the result above to the result of K-S statistic, which is 0.458, they are closed. So the result is right. And importanly, p_value is 0.421, which demonstrates that we should accept the null hypothesis. That is the two distribution have significant differences.

Design experiments

Design experiments for evaluating the performance of the NN,energy, and ball methods in various situations.
(1)Unequal variances and equal expectations
(2)Unequal variances and unequal expectations
(3)Non-normal distributions: t distribution with 1 df (heavy-tailed distribution), bimodel distribution (mixture of two normal distributions)
(4)Unbalanced samples (say, 1 case versus 10 controls)
Note: The parameters should be choosen such that the powers are distinguishable (say, range from 0.3 to 0.9).

Answer

(1)Note that when choosing parameters(variance), the standard deviation of one distribution should be as close as possible to the standard deviation of the other distribution, so that the difference between the two samples is small and the power changes between 0.3 and 0.9.

m <- 100#number of experiments
k<-3#number of NN
p<-2#dimension of x
mu <- 1
sigma<-1.5
n1<-50
n2<-50#The number of x and y
R<-999#The number of replicates
alpha<-0.1
n<-n1+n2
N<-c(n1,n2)

##NN statistic
Tn<-function(z,ix,sizes,k){
  n1<-sizes[1]
  n2<-sizes[2]
  n<-n1+n2
  if(is.vector(z)) z<-data.frame(z,0)
  z<-z[ix,]
  NN<-RANN::nn2(data = z,k=k+1)
  block1<-NN$nn.idx[1:n1,-1]
  block2<-NN$nn.idx[(n1+1):n,-1]
  i1<-sum(block1<n1+0.5)
  i2<-sum(block2>n1+0.5)
  (i1+i2)/(k*n)
}

##NN permutation test
eqdist.nn<-function(z,sizes,k){
  boot.obj<-boot::boot(data=z,statistic=Tn,R=R,sim="permutation",sizes=sizes,k=k)
  ts<-c(boot.obj$t0,boot.obj$t)
  p.value<-mean(ts>=ts[1])
  list(statistic=ts[1],p.value=p.value)
}

#simulation
p.values<-matrix(NA,m,3)
for(i in 1:m){
  x<-matrix(rnorm(n1*p,mean = mu),ncol = p)
  y<-cbind(rnorm(n2,mean=mu,sd=sigma),rnorm(n2,mean = mu,sd=sigma))
  z<-rbind(x,y)
  p.values[i,1]<-eqdist.nn(z,N,k)$p.value
  p.values[i,2]<-energy::eqdist.etest(z,sizes=N,R=R)$p.value
  p.values[i,3]<-Ball::bd.test(x=x,y=y,R=R,seed=i*1)$p.value
}
pow<-colMeans(p.values<alpha)#power
pow

The result shows that Ball test is the most powerful than the other two method, and the NN test is the least powerful at the circumstance of Unequal variances and equal expectations.
(2)we only need to change expectations in (1) here. Also remember that we should choose parameters so that the differences between the two samples is relatively small.

m <- 100
k<-3
p<-2
mu <- 0.2
sigma<-1.5
set.seed(1)
n1<-50
n2<-50#The number of x and y
R<-999#The number of replicates
alpha<-0.1
n<-n1+n2
N<-c(n1,n2)

Tn<-function(z,ix,sizes,k){
  n1<-sizes[1]
  n2<-sizes[2]
  n<-n1+n2
  if(is.vector(z)) z<-data.frame(z,0)
  z<-z[ix,]
  NN<-RANN::nn2(data = z,k=k+1)
  block1<-NN$nn.idx[1:n1,-1]
  block2<-NN$nn.idx[(n1+1):n,-1]
  i1<-sum(block1<n1+0.5)
  i2<-sum(block2>n1+0.5)
  (i1+i2)/(k*n)
}

eqdist.nn<-function(z,sizes,k){
  boot.obj<-boot::boot(data=z,statistic=Tn,R=R,sim="permutation",sizes=sizes,k=k)
  ts<-c(boot.obj$t0,boot.obj$t)
  p.value<-mean(ts>=ts[1])
  list(statistic=ts[1],p.value=p.value)
}

p.values<-matrix(NA,m,3)
for(i in 1:m){
  x<-matrix(rnorm(n1*p),ncol = p)
  y<-cbind(rnorm(n2,mean=mu,sd=sigma),rnorm(n2,mean = mu,sd=sigma))
  z<-rbind(x,y)
  p.values[i,1]<-eqdist.nn(z,N,k)$p.value
  p.values[i,2]<-energy::eqdist.etest(z,sizes=N,R=R)$p.value
  p.values[i,3]<-Ball::bd.test(x=x,y=y,R=R,seed=i*1)$p.value
}
pow<-colMeans(p.values<alpha)
pow

The result also shows that the Ball test is the most powerful and the NN test is the least powerful at the circumstance of unequal variances and unequal expectations.
(3)

### design experiments
m <- 100
k<-3
p<-2
mu <- 0.1
sigma<-1
rat<-0.5
n1<-50
n2<-50#The number of x and y
R<-999#The number of replicates
alpha<-0.1
n<-n1+n2
N<-c(n1,n2)

Tn<-function(z,ix,sizes,k){
  n1<-sizes[1]
  n2<-sizes[2]
  n<-n1+n2
  if(is.vector(z)) z<-data.frame(z,0)
  z<-z[ix,]
  NN<-RANN::nn2(data = z,k=k+1)
  block1<-NN$nn.idx[1:n1,-1]
  block2<-NN$nn.idx[(n1+1):n,-1]
  i1<-sum(block1<n1+0.5)
  i2<-sum(block2>n1+0.5)
  (i1+i2)/(k*n)
}

eqdist.nn<-function(z,sizes,k){
  boot.obj<-boot::boot(data=z,statistic=Tn,R=R,sim="permutation",sizes=sizes,k=k)
  ts<-c(boot.obj$t0,boot.obj$t)
  p.value<-mean(ts>=ts[1])
  list(statistic=ts[1],p.value=p.value)
}

p.values<-matrix(NA,m,3)
## Construct the mix distribution object.
myMix <- distr::UnivarMixingDistribution(distr::Norm(mean=mu, sd=sigma),distr::Norm(mean=10*mu, sd=2*sigma), mixCoeff=c(rat, 1-rat))
##and then a function for sampling random variates from it
rmyMix <- distr::r(myMix)
for(i in 1:m){
  x<-matrix(rmyMix(n1*p),ncol=p)
  y<-cbind(rt(n2,df=1),rt(n2,df=1))
  z<-rbind(x,y)
  p.values[i,1]<-eqdist.nn(z,N,k)$p.value
  p.values[i,2]<-energy::eqdist.etest(z,sizes=N,R=R)$p.value
  p.values[i,3]<-Ball::bd.test(x=x,y=y,R=R,seed=i*1)$p.value
}
pow<-colMeans(p.values<alpha)
pow

The result shows that the energy test is the most powerful test, and the NN test is also the least powerful under the circumstance of non-normal distributions.
(4)For convenience, we do experiments under normal distributions. And you can also change the distribution to other kind of distribution.

m <- 100
k<-3
p<-2
mu <- 0.5
n1<-200
n2<-20#The number of x and y
R<-999#The number of replicates
alpha<-0.1
n<-n1+n2
N<-c(n1,n2)

Tn<-function(z,ix,sizes,k){
  n1<-sizes[1]
  n2<-sizes[2]
  n<-n1+n2
  if(is.vector(z)) z<-data.frame(z,0)
  z<-z[ix,]
  NN<-RANN::nn2(data = z,k=k+1)
  block1<-NN$nn.idx[1:n1,-1]
  block2<-NN$nn.idx[(n1+1):n,-1]
  i1<-sum(block1<n1+0.5)
  i2<-sum(block2>n1+0.5)
  (i1+i2)/(k*n)
}

eqdist.nn<-function(z,sizes,k){
  boot.obj<-boot::boot(data=z,statistic=Tn,R=R,sim="permutation",sizes=sizes,k=k)
  ts<-c(boot.obj$t0,boot.obj$t)
  p.value<-mean(ts>=ts[1])
  list(statistic=ts[1],p.value=p.value)
}

p.values<-matrix(NA,m,3)
for(i in 1:m){
  x<-matrix(rnorm(n1*p,mean = mu),ncol = p)
  y<-cbind(rnorm(n2),rnorm(n2))
  z<-rbind(x,y)
  p.values[i,1]<-eqdist.nn(z,N,k)$p.value
  p.values[i,2]<-energy::eqdist.etest(z,sizes=N,R=R)$p.value
  p.values[i,3]<-Ball::bd.test(x=x,y=y,R=R,seed=i*1)$p.value
}
pow<-colMeans(p.values<alpha)
pow

The result shows that the energy test is the most powerful and the NN test is the least powerful again at the circumstance of unbalanced sample.
In a nutshell, Ball test is more powerful for variances are different. And at other circumstances, the energy test does better.

Question 9.3

Use the Metropolis-Hastings sampler to generate random variables from a standard Cauchy distribution. Discard the first 1000 of the chain, and compare the deciles of the generated observations with the deciles of the standard Cauchy distribution (see qcauchy or qt with df=1). Recall that a Cauchy(θ, η) distribution has density function$$f(x)=\frac{1}{\theta\pi(1+[(x-\eta)/\theta]^{2})},\quad -\infty0$$
The standard Cauchy has the Cauchy($\theta$ = 1, $\eta$ = 0) density. (Note that the standard Cauchy density is equal to the Student t density with one degree of freedom.)

Answer

f <- function(x) {
  return(1/pi/(1+x^2)) 
} 
m <- 10000 
k <- 0
sigma<-3
set.seed(1)
x <- numeric(m) 
x[1] <- rnorm(1,sd=sigma) 
u <- runif(m)
for (i in 2:m) { 
  xt <- x[i-1] 
  y <- rnorm(1, mean = xt,sd=sigma) #proposal distribution
  num <- f(y) * dnorm(xt, mean = y,sd=sigma) #the numerator  of r(Xt,Y)
  den <- f(xt) * dnorm(y, mean = xt,sd=sigma) #the denominator of r(Xt,Y)
  if (u[i] <= num/den) x[i] <- y #accept y
  else { 
    x[i] <- xt 
    k <- k+1 #the number of rejected candidate points
    } 
}
print(k)
#a partial plot of the sample vs the time index  starting at 5000
index <- 5000:5500 
y1 <- x[index] 
plot(index, y1, type="l", main="", ylab="x")
#display the deciles of the generated observations 
#and the deciles of the standard Cauchy distribution
b <- 1001 #discard the burnin sample 
y <- x[b:m] 
a <- seq(0.1,0.9,0.1) 
qt(a,1) #quantiles of Cauchy 
quantile(y, a) #quantiles of sample after discarding
#plot hist of samples after dicarding and density curve of Cauchy
a1 <- ppoints(100) 
qr <- qt(a1,1)
hist(y, breaks="scott", main="", xlab="", freq=FALSE,ylim = c(0,0.35)) 
lines(qr, f(qr))

I used random walk metropolis. There are 4816, approximately 50% of the candidate points are rejected, so the chain is somewhat inefficient. The difference of two deciles is not too far. And the sampling distribution is almost the Cauchy distribution. Actually, I first used ($\chi^2$) distribution as the proposal distribution, but it is too inefficient and fits too bad.

Question 9.6

Rao [220, Sec. 5g] presented an example on genetic linkage of 197 animals in four categories (also discussed in [67, 106, 171, 266]). The group sizes are (125, 18, 20, 34). Assume that the probabilities of the corresponding multinomial distribution are$$(\frac{1}{2}+\frac{\theta}{4},\frac{1-\theta}{4},\frac{1-\theta}{4},\frac{\theta}{4})$$
Estimate the posterior distribution of θ given the observed sample, using one of the methods in this chapter.

Answer

The posterior distribution of $\theta$ given the observed sample is therefore$$Pr(\theta|(x_{1},x_{2},x_{3},x_{4}))=\frac{197!}{x_{1}!x_{2}!x_{3}!x_{4}!}(\frac{1}{2}+\frac{\theta}{4})^{x_{1}}(\frac{1-\theta}{4})^{x_{2}}(\frac{1-\theta}{4})^{x_{3}}(\frac{\theta}{4})^{x_{4}}$$,where $(x_{1},x_{2},x_{3},x_{4})=(125,18,20,34)$.
The approach to estimating the posterior distribution is to generate a chain that converges to it. Use the random walk Metropolis sampler with a uniform proposal distribution to generate the posterior distribution of $\theta$. The candidate point Y is accepted with probability$\alpha(X_{t},Y)=min(1,\frac{f(Y)}{f(X_{t})})$.

m <- 5000 #length of the chain 
burn <- 1000 #burn-in time 
w <- .5 #width of the uniform support set 
nums <- 197 #the sum of group sizes
z<-c(125,18,20,34)
x <- numeric(m) #the chain
prob <- function(y, z) { 
  # computes (without the constant) the target density 
  if (y < 0 || y >= 1) 
    return (0)
  return((1/2+y/4)^z[1] * ((1-y)/4)^z[2] * ((1-y)/4)^z[3] * (y/4)^z[4])
  }
u <- runif(m) #for accept/reject step 
v <- runif(m,-w,w) #proposal distribution 
x[1] <- runif(1) 
for (i in 2:m) {
  y <- x[i-1] + v[i] #random walk
  if (u[i] <= prob(y, z) / prob(x[i-1], z)) x[i] <- y 
  else x[i] <- x[i-1] 
}
xb <- x[(burn+1):m] 
print(mean(xb)) #mean of theta

#plot hist of samples and density curve of theta after dicarding 
hist(xb, breaks="scott", main="", xlab="", freq=FALSE) 
lines(density(xb,kernel = "epanechnikov")) 

The result shows the estimation of $E\theta$. And the plot shows the posterior distribution of theta.

Question 9.6

For exercise 9.6, use the Gelman-Rubin method to monitor convergence of the chain, and run the chain until the chain has converged approximately to the target distribution according to $\hat{R}<1.2$.

Answer

Gelman.Rubin <- function(psi) {
  # psi[i,j] is the statistic psi(X[i,1:j])
  # for chain in i-th row of X
  psi <- as.matrix(psi)
  n <- ncol(psi)
  k <- nrow(psi)
  psi.means <- rowMeans(psi) #row means
  B <- n * stats::var(psi.means) #between variance est.
  psi.w <- apply(psi, 1, "var") #within variances
  W <- mean(psi.w) #within est.
  v.hat <- W*(n-1)/n + (B/n) #upper variance est.
  r.hat <- v.hat / W #G-R statistic
  return(r.hat)
}
prob <- function(yp, zp) { 
  # computes (without the constant) the target density 
  if (yp < 0 || yp >= 1) 
    return (0)
  return((1/2+yp/4)^zp[1] * ((1-yp)/4)^zp[2] * ((1-yp)/4)^zp[3] * (yp/4)^zp[4])
}
x.chain<-function(ms,ws,x1,zs){
  #generates a Metropolis chain  
  #with Uniform proposal distribution 
  #and starting value X1 
  x <- numeric(ms) #the chain
  u <- runif(ms) #for accept/reject step 
  v <- runif(ms,-ws,ws) #proposal distribution 
  x[1] <- x1 
  for (i in 2:ms) {
    y <- x[i-1] + v[i]
    if (u[i] <= prob(y, zs) / prob(x[i-1], zs)) x[i] <- y 
    else x[i]<-x[i-1]
    }
  return(x)
}
m <- 10000 #length of the chain 
b <- 1000 #burn-in time 
w <- .5 #width of the uniform support set 
nums <- 197 #the sum of group sizes
z<-c(125,18,20,34)
k<-4 #number of chains to generate 
x0 <- c(0.2,0.4,0.6,0.8)#choose overdispersed initial values
#generate the chains 
X <- matrix(0, nrow=k, ncol=m) 
for (i in 1:k) 
  X[i, ] <- x.chain(m,w,x0[i],z)
#compute diagnostic statistics 
psi <- t(apply(X, 1, cumsum)) 
for (i in 1:nrow(psi)) psi[i,] <- psi[i,] / (1:ncol(psi)) 
print(Gelman.Rubin(psi))
#plot psi for the four chains 
for (i in 1:k) plot(psi[i, (b+1):m], type="l", xlab=i, ylab=bquote(psi)) 
#plot the sequence of R-hat statistics 
rhat <- rep(0, m) 
for (j in (b+1):m) rhat[j] <- Gelman.Rubin(psi[,1:j]) 
plot(rhat[(b+1):m], type="l", xlab="", ylab="R") 
abline(h=1.1, lty=2,col="red")

The result can be affected dramatically by the uniform support set w and the length of chains m. Maybe I choosed a bigger w(equal to the effect of variance) so that the rate of convergence is a bit of slow.

Question 11.4

Find the intersection points $A(k)$ in $(0,\sqrt{k})$ of the curves $S_{k-1}(a)=P(t(k-1)>\sqrt{\frac{a^{2}(k-1)}{k-a^{2}}})$
and$$S_{k}(a)=P(t(k)>\sqrt{\frac{a^{2}(k)}{k+1-a^{2}}})$$
for $k = 4 : 25, 100, 500, 1000$, where $t(k)$ is a Student $t$ random variable with $k$ degrees of freedom. (These intersection points determine the critical values for a t-test for scale-mixture errors proposed by Szekely [260].)

Answer

findIntersection = function(k){

  s.k.minus.one = function(a){
    1-pt(sqrt(a^2*(k-1)/(k-a^2)),df=k-1)
  }
  s.k=function(a){
    1-pt(sqrt(a^2*k/(k+1-a^2)),df=k)
  }
  f=function(a){
    s.k(a)-s.k.minus.one(a)
  }
  eps=1e-2
  return(stats::uniroot(f,interval=c(eps,sqrt(k)-eps))$root)
}
rs=sapply(c(4:25,100,500,1000),function(k){
  findIntersection(k)
})
print(rs)

Question 11.6

Write a function to compute the cdf of the Cauchy distribution, which has density$$\frac{1}{\theta\pi(1+[(x-\eta)/\theta]^{2})}, -\inftywhere$\theta > 0$. Compare your results to the results from the R function pcauchy.
(Also see the source code in pcauchy.c.)

Answer

cdf.Cauchy=function(x,theta,eta){
  ##function to compute cdf of Cauchy distribution
  f<-function(y){
    1/(theta*pi*(1+((y-eta)/theta)^2))
  }
  return(integrate(f,lower = Inf,upper=x))
}

##Compare two functions
theta<-1
eta<-0
points<-seq(-50,50,10)
rc<-sapply(points,function(x){
  cdf.Cauchy(x,theta,eta)$value
})
pc<-sapply(points,function(x){
  pcauchy(x,eta,theta)
})
com<-cbind(rc,pc)
colnames(com)<-c('cdf.Cauchy',"pcauchy")
print(com)

The result show that they are equal.

Question A-B-O blood type problem

(1)Use EM algorithm to solve MLE of p and q (consider missing data nAA and nBB)
(2)Record the maximum likelihood values in M-steps, are they increasing?

Answer

According to hardy weinberg's equilibrium law, we can compute the value of r. So, I writed two versions of which are computing r directly and not comparing r. The optimazation result of the two have a little difference.

## computing **r** directly
na.=28
nb.=24
nab=70
noo=41
n=na.+nb.+nab+noo
##According to hardy weinberg's equilibrium law, 
##we can compute the value of r
r=sqrt(noo/n)
##initialize p0,q0
p.start<-0.1
q.start<-0.1
##log-lik function and maximization
##note that this is negative log-lik
upfun<-function(p0,q0){
  loglik<-function(x){
    return(-(p0/(2-p0-2*q0)*na.*log(1/2/(r))+
             na.*log(2*x[1]*(r))+nb.*log(2*x[2]*(r))+
             q0/(q0+2*(1-p0-q0))*nb.*log(1/2/(r))+
             2*noo*log(r)+nab*log(2*x[1]*x[2])))
  }
  eval_g<-function(x){
    x[1]+x[2]-1+r
  }
  ##According to hardy weinberg's equilibrium law, 
  ##we can compute the upper and lower bounds of p,q
  p<-nloptr::nloptr(x0=c(p0,q0),eval_f = loglik,
            eval_g_eq = eval_g, 
            opts = list(algorithm = "NLOPT_GN_ISRES",
                        xtol_rel=1e-8),lb=c(0,0),
            ub=c(sqrt(na./n)-1e-5,sqrt(nb./n)-1e-5))
  loglik.val<-loglik(p$solution)
  return(c(p$solution,loglik.val))
}
p.new<-upfun(p.start,q.start)[1:2]
p.old<-c(0,0)
ml.value<-0
ml.value<-upfun(p.start,q.start)[3]
##iteration
for(i in 1:60){
  p.old<-p.new
  result<-upfun(p.new[1],p.new[2])
  p.new<-result[1:2]
  ml.value<-append(ml.value,result[3])
}
##output the result
print(p.new)
plot(ml.value,type="l",lwd=2,col='lightcoral')

From the figure above, we can roughly say that maximum likelihood values are decreasing.

Question 3

Use both for loops and lapply() to fit linear models to the mtcars using the formulas stored in this list:
formulas <- list( mpg ~ disp, mpg ~ I(1 / disp), mpg ~ disp + wt, mpg ~ I(1 / disp) + wt )

Answer

attach(mtcars)
formulas <- list( mpg ~ disp, mpg ~ I(1 / disp), mpg ~ disp + wt, mpg ~ I(1 / disp) + wt )
mod<-list()
mod2<-list()
for (i in 1:length(formulas)) {
  mod[[i]]<-lm(formulas[[i]])
}
mod2<- lapply(formulas, lm)
mod
mod2
detach(mtcars)

Question 4

Fit the model mpg ~ disp to each of the bootstrap replicates of mtcars in the list below by using a for loop and lapply(). Can you do it without an anonymous function?
bootstraps <- lapply(1:10, function(i) { rows <- sample(1:nrow(mtcars), rep = TRUE) mtcars[rows, ] })

Answer

attach(mtcars)
modd<-list()
modd2<-list()
bootstraps <- lapply(1:10, function(i) { rows <- sample(1:nrow(mtcars), rep = TRUE);mtcars[rows, ] })
for(i in 1:length(bootstraps)){
  modd[[i]]<-lm(mpg~disp,data = bootstraps[[i]])
}
modd2<-lapply(bootstraps, function(d){lm(mpg~disp,data=d)})
modd
modd2
## get rid of anoymous function
modd3 <- lapply(bootstraps, lm, formula = mpg ~ disp)
modd3
detach(mtcars)

Question 5

For each model in the previous two exercises, extract $R^{2}$ using the function below.
rsq <- function(mod) summary(mod)$r.squared

Answer

rsq <- function(mod) summary(mod)$r.squared
r.sq1<-vapply(mod, rsq,0)
r.sq2<-vapply(mod2, rsq,0)
r.sq3<-vapply(modd, rsq,0)
r.sq4<-vapply(modd2, rsq,0)
r.sq1
r.sq2
r.sq3
r.sq4

Question 3 in Page214

The following code simulates the performance of a t-test for non-normal data. Use sapply() and an anonymous function to extract the p-value from every trial.
trials <- replicate( 100, t.test(rpois(10, 10), rpois(7, 10)), simplify = FALSE )
Extra challenge: get rid of the anonymous function by using [[ directly.

Answer

trials <- replicate(100, t.test(rpois(10, 10), rpois(7, 10)), simplify = FALSE )
p_value<-sapply(trials, function(obj){obj$p.value})
##Extra challenge
p_values_challenge<-sapply(trials, `[[`, 'p.value')
##Compare
p_value
p_values_challenge
all.equal(p_values_challenge,p_value)

Question 6 in Page214

Implement a combination of Map() and vapply() to create an lapply() variant that iterates in parallel over all of its inputs and stores its outputs in a vector (or a matrix). What arguments should the function take?

Answer

lapply.variant<-function(x,f,type,...){
  vapply(x, function(y){unlist(Map(f,y))},type)
}
##compare if the variant of lapply is equal to lapply
all.equal(unlist(lapply(1:5, function(x) x^2)), lapply.variant(1:5, function(x) x^2,0))

Question 4

Make a faster version of chisq.test() that only computes the chi-square test statistic when the input is two numeric vectors with no missing values. You can try simplifying chisq.test() or by coding from the mathematical definition ( http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test ).

Answer

expected <- function(colsum, rowsum, total) {
  ##the "theoretical frequency" for a cell
  ##given the hypothesis of independence
  (colsum / total) * (rowsum / total) * total
}
chi_stat <- function(observed, expected) {
  ##The function to compute test statistic
  ((observed - expected) ^ 2) / expected
}
# Implements https://en.wikipedia.org/wiki/Chi-squared_test
chisq_test2 <- function(x, y) {
  total <- sum(x) + sum(y) #compute total number of observations
  rowsum_x <- sum(x)
  rowsum_y <- sum(y)
  chistat <- 0
  for (i in seq_along(x)) {
    colsum <- x[i] + y[i]
    expected_x <- expected(colsum, rowsum_x, total)
    expected_y <- expected(colsum, rowsum_y, total)
    ##cumulative of test statistic say r rows
    ##and c columns
    chistat <- chistat + chi_stat(x[i], expected_x)
    chistat <- chistat + chi_stat(y[i], expected_y)
  }
  chistat
}
##example to compare my own version and chisq.test()
print(chisq_test2(seq(3, 8), seq(4, 9)))
print(chisq.test(seq(3, 8), seq(4, 9)))
print(microbenchmark::microbenchmark(
  chisq_test2(seq(3, 8), seq(4, 9)),
  chisq.test(seq(3, 8), seq(4, 9))
))

The result show that the version coded by wikipedia is much faster than the R version chisq.test().

Question 5

Can you make a faster version of table() for the case of an input of two integer vectors with no missing values? Can you use it to speed up your chi-square test?

Answer

table2 <- function(x, y) {
  #extract unique values in x and y
  x_val <- unique(x)
  y_val <- unique(y)
  mat <- matrix(0L, length(x_val), length(y_val))
  for (i in seq_along(x)) {
    ##Count the number of duplicates in x and y
    mat[which(x_val == x[[i]]), which(y_val ==y[[i]])] <- mat[which(x_val == x[[i]]), which(y_val == y[[i]])] + 1L
  }
  ##Modify names
  dimnames <- list(x_val, y_val)
  names(dimnames) <- as.character(as.list(match.call())[-1])
  tab <- array(mat, dim = dim(mat), dimnames = dimnames)
  class(tab) <- "table"
  tab
}
##compare my own version and original version
a <- c(1, 2, 3)
identical(table(a, a), table2(a, a))
microbenchmark::microbenchmark(table(a, a), table2(a, a))

j <- c(0, 10, 100, 1000, 10000)
identical(table(j, j), table2(j, j))
microbenchmark::microbenchmark(table(j, j), table2(j, j))

The result show that my own version is much faster than the original version.



Scopia/StatComp18053 documentation built on May 22, 2019, 2:44 p.m.