Homework-2019.09.20

Question

Use knitr to produce at least 3 examples(texts,figures,tables).

Answer

There is a data set called "tli" from the package "datasets".It contains grades,sex,disadvg,ethnicty and tlimth of 100 randomly selected students participating in the Assessment of Academic Skills.I list the data with a table and plot the data.Then Icalculate the convariance between grade and tlimth.

library(datasets)#Require package "datasets"
library(xtable)#Require package "xtable"
data(tli)#Pick a data set "tli"
print(xtable(tli[1:20,]), type= 'html')#List "tli" with a table
plot(tli)#Visualize the data
a<-cov(tli[,1],tli[,5])#calculate the convariance between grade and tlimth

The covariance between grade and tlimth isr a

In example 2,I defined variables x,y and calculated the linear regression between x and y. I listed the results and visualized it.

library(knitr)#Require package "knitr"
x <- 1:10#Value x from 1 to 10
y <- x^2#Define y
lmr <- lm(y ~ x)#Calculate the linear regression between x and y
co <- summary(lmr)$coefficients#Put the results into a matrix
knitr::kable(co)#List co by table
plot(x,y)#Visualize (x,y)

The $R^2$ is r summary(lmr)$r.squared

I picked a data set from package "datasets" called "PlantGrowth".It is a data frame of 30 cases on 2 variables:weight and group.The levels of group are ‘ctrl’, ‘trt1’, and ‘trt2’.The codes list the data and plot the number of each groups with different colors.

data("PlantGrowth")#Pick a data set "tli"
print(xtable::xtable(PlantGrowth[1:30,]), type= 'html')
grp<-table(PlantGrowth[,'group'])
barplot(grp,main = "number of group",col=c("brown2", "aquamarine1","aliceblue"))

The number of ctrl,trt1 and trt2 is r grp.

Homework-2019.9.29

Question 3.4

The Rayleigh density is$$f(x)=\frac{x}{\sigma^{2}} e^{-x^{2} /\left(2 \sigma^{2}\right)}, \quad x \geq 0, \sigma>0$$ Develop an algotithm to generate random samples from a Rayleigh($\sigma$) distribution. Generate Rayleigh($\sigma$) samples for several choices of $\sigma>0$ and check that the mode of the generated samples is close to the theoretical mode $\sigma$ (check the histgram).

Answer

Idea: We can use the Inverse Transform Method to solve the problem. Through the Rayleigh density we can calculate the distribution function is $$F(x)=\ 1-e^{\frac{-x^{2}}{2 \sigma^{2}}}$$.I write a function to generate random samples.Then I choose $\sigma=2,3,4,5$ to check that whether the mode of the generated samples is close to the theoretical mode or not.

set.seed(12345) 
Rayleigh.hist<-function(n,sigma) #write a function to gennerate random samples
{
  u<-runif(n)
  x<-sqrt(-2*sigma^2*log(1-u))
  hist(x,breaks=20,prob=TRUE,main = paste("Histogram of sigma=" , sigma)) #density histgram of sample
  y<-seq(0,20,.01)
  lines(y,(y/sigma^2)*exp(-y^2/(2*sigma^2))) #density curve f(x)
}
#par(mfrow=c(2,2)) #2 graphs per page
for (sigma in 2:5) #use the loop to choose sigma =2,3,4,5
  {
    Rayleigh.hist(10000,sigma)
  }

Result: Through the histgrams, the mode of the generated samples is close to the theoretical mode $\sigma$.

Question 3.11

Generate a random sample of size 1000 from a normal location mixture. The components of the mixture have N(0,1) and N(3,1) distributions with mixing probabilities$\ p_{1}$ and $\ p_{2}=1-p_{1}$ . Graph the histogram of the sample with density superimposed, for$\ p_{1}=0.75$ . Repeat with different values for$\ p_{1}$ and observe whether the empirical distribution of the mixture appears to be bimodal. Make a conjecture about the values of $\ p_{1}$ that produce bimodal mixtures.

Answer

Idea: (1) Generate x1 ~N(0,1), x2 ~N(3,1). (2) Generate an integer p=0,1,where P(0)=p1,P(1)=p2. (3) x=px1+(1-p)x2 and graph the histgram.

#write a function to generate sample from normal location mixture
loc.mix<-function(p1){
  x1<-rnorm(1000,0,1) #generate sample from normal distribution N(0,1)
  x2<-rnorm(1000,3,1)
  u<-runif(1000)
  p<-as.integer(u>p1) #vectors of 0's and 1's
  x<-p*x1+(1-p)*x2 #the mixture
  hist(x,breaks=50,prob=TRUE,main = paste("Histogram of p1=" , p1)) #histgram of the sample
}
loc.mix(0.75) #p1=0.75
#par(mfrow=c(1,4)) #4 graphs per page
for (p1 in seq(0,1,0.02)) #use the loop to repeat with p1=0, 0.02,0.04,0.06,...,1 
  {
    loc.mix(p1)
  }

Result: From the histgrams of different values of $\ p_{1}$, I deduce that when $0.3<\mathrm p_{1}<0.7$ ,it can produce bimodal mixtures.

Question 3.18

Write a function to generate a random sample from a $\ W_{d}(\Sigma, n)$ (Wishart) distribution for $\ n>d+1 \geq 1$ , based on Bartlett's decomposition.

Answer

Idea: Follow the steps in the textbook(P80)

#write a function to generate a random sample from a wishart distribution
wishart.Bartlett <-function(d,n,Sigma){
  T<-matrix(0,d,d) #a d*d 0-matrix T
  for (i in 1:d) #use a loop to generate T
     {
       for (j in 1:d)
         {
          if(i==j)
             T[i,j]<-rnorm(1,0,1) #T(i,j)~N(0,1),i>j
          if(i<j)   
             T[i,j]<-sqrt(rchisq(1,n-i+1,ncp = 0)) #T(i,i)~sqrt(X^2(n-i+1))
         }
     }
  A<-T%*%t(T) #A has a Wd(Id,n) distribution
  L<-chol(Sigma) #Choleski factorization
  L%*%A%*%t(L) #the sample from a Wd(sigma,n) distribution
}

#an example to test the function
wishart.Bartlett(2,4,matrix(c(5,1,1,3),2,2)) 

Result:Through the example,the function works.

Homework-2019.10.12

5.1

Compute a Monte Carlo estimate of $\int_{0}^{\pi / 3} \sin t d t$ and compare your estimate with the exact value of the integral.

Answer

m<-10000
x<-runif(m,min=0,max=pi/3)
esti<-mean(sin(x))*(pi/3)
print(esti) #estimate
print(cos(0)-cos(pi/3)) #extra value

The estimate isr esti and the extra value isr (cos(0)-cos(pi/3))

5.10

Use Monte Carlo integration with antithetic variables to estimate $\int_{0}^{1} \frac{e^{-x}}{1+x^{2}} d x$ and find the approximate reduction in variance as a percentage of the variance without variance reduction.

Answer

#The anti.MC function will compute the estimate with or without antithetic sampling
N<-10000
u<-v<-numeric(N/2)
anti.MC <- function(antithetic = TRUE) {
        u <- runif(N/2)
        if (!antithetic) v <- runif(N/2)
            else  v <- 1 - u
        g1<-exp(u)/(1+u^2)
        g2<-exp(v)/(1+v^2)
        esti <- mean(c(g1,g2))
        esti
}

#estimate with or without antithetic sampling
m <- 1000
autith<- anti.MC(anti =TRUE)#with autithentic
noauti<- anti.MC(anti = FALSE) #without
print(autith)
print(noauti)

#The approximate reduction in variance can be estimated by a simulation under both methods
MC1 <- MC2 <- numeric(m)
for (i in 1:m) 
    {
        MC1[i] <- anti.MC(anti = FALSE)
        MC2[i] <- anti.MC(anti =TRUE)
    }
    print((var(MC1) - var(MC2))/var(MC1))#the approximate reduction in variance

The estimate with antithetic sampling is r autith.The antithetic variable approach achieved approximately 66% reduction.

5.15

Obtain the stratified importance sampling estimate in Example 5.13 and compare it with the result of Example 5.10.

Answer

m <-10000 #number of replicates
k <-5 #number of strata
M<-m/k #importance sample size
g <- function(x) {
      exp(-x - log(1+x^2)) * (x > 0) * (x < 1) } 
fj <-function(x) {
     5*exp(-x) / (1 - exp(-1))} 
fg <-function(x) {
      g(x)/fj(x)} 
F <-function(u) {
      - log(1 - u * (1 - exp(-1)))} 
theta.SI<-se.SI<-matrix(0,k,1)
for (j in 1:k) #use the loop to estimate in k intervals 
   {
    if(j==1)
      { 
       x <- runif(m,0,F(1)*j/k)
       theta.SI[j] <- mean(g(x))
       se.SI[j]<-var(fg(x))
      }
    if(j>1 && j<5)
      {
       x <- runif(m,F(1)*(j-1)/k,F(1)*j/k)
       theta.SI[j] <- mean(g(x))
       se.SI[j]<-var(fg(x))
      }
    if(j==5)
      { 
       x <- runif(m,F(1)*(j-1)/k,1)
       theta.SI[j] <- mean(g(x))
       se.SI[j]<-var(fg(x))}
    }
 apply(theta.SI,2,mean) #estimate 
 apply(se.SI,2,sum)/M #variance

The stratified importance sampling estimate is r apply(theta.SI,2,mean).Compare to example 5.10,it represents a more than 98% reduction in variance.

Homework-2019.10.18

Question 6.5

Suppose a 95% symmetric t-interval is applied to estimate a mean, but the sample data are non-normal. Then the probability that the confidence interval covers the mean is not necessarily equal to 0.95. Use a Monte Carlo experiment to estimate the coverage probability of the t-interval for random samples of $\chi^{2}(2)$ data with sample size n = 20. Compare your t-interval results with the simulation results in Example 6.4. (The t-interval should be more robust to departures from normality than the interval for variance.)

Answer

Generate $\chi^{2}(2)$ data with sample size n = 20. Chi-square distribution $\chi^{2}(2)$ has mean=2 and variance=4. $T=\frac{\bar{X}-\mu_{0}}{S / \sqrt{n}} \sim t(n-1)$. Then the coverage probability by a 95% symmetric t-interval is $P\left(-t_{n-1}\left(\frac{\alpha}{2}\right)<T<t_{n-1}\left(\frac{\alpha}{2}\right)\right)$ .

#estimate the coverage probability
set.seed(123)
n<-20 #sample size
alpha<-0.05
Stat<- replicate(1000, expr = {
  x <- rchisq(n, df = 2)
  sqrt(20)*(mean(x)-2) / 2
} )
cover.hat<-mean(Stat < -qt(alpha/2,n-1) & Stat > qt(alpha/2,n-1)) #coverage probability
cover.hat

The estimate of the coverage probability is r cover.hat. In example 6.4,the result is 0.95. Compare to example 6.4, the t-interval result is close but a little larger.

Question 6.6

Estimate the 0.025, 0.05, 0.95, and 0.975 quantiles of the skewness $\sqrt{b_{1}}$ under normality by a Monte Carlo experiment. Compute the standard error of the estimates from (2.14) using the normal approximation for the density (with exact variance formula). Compare the estimated quantiles with the quantiles of the large sample approximation $\sqrt{b_{1}} \approx N(0,6 / n)$.

Answer

I write a function to estimate the quantiles with different sample size n. In the function, firstly, I generate random skewness under normality and use function "quantile" to estimate the quantiles of skewness. Then I use (2.14) to compute the standard error. Finally, I use a table to compare the estimated quantiles with the quantiles of $\sqrt{b_{1}} \approx N(0,6 / n)$. I pick n=10,100,1000.

set.seed(123)
#write a function to choose different n
quan.esti<-function(n){
#estimate the quantiles of skewness
k<-10000 #number of replicates
b<-numeric(k)
for (i in 1:k)
  {
    x<-rnorm(n,0,1)
    xbar<-mean(x)
    m3 <- mean((x - xbar)^3)
    m2 <- mean((x - xbar)^2)
    b[i]<- m3 / m2^1.5
  }
quanti<-quantile(b,c(0.025,0.05,0.95,0.975)) #estimate of quantiles
#compute the standard error
f<-function(x) exp(-x^2/(2*(6/n)^2))/sqrt(2*pi*sqrt(6/n))
var.hat<-numeric(length(quanti))
q<-c(0.025,0.05,0.95,0.975)
for (i in 1:length(quanti))
  {
    var.hat[i]<-q[i]*(1-q[i])/(n*f(quanti[i])^2)
  }
se.hat<-sqrt(var.hat) #SE
#compare with N(0,6/n)
norm.hat<-qnorm(c(0.025,0.05,0.95,0.975),0,sqrt(6/n))
#output the results
data.frame(quanti,se.hat,norm.hat)
}
quan.esti(10) #n=10
quan.esti(100) #n=100
quan.esti(1000) #n=1000

Compare the estimated quantiles with the quantiles of the large sample approximation $\sqrt{b_{1}} \approx N(0,6 / n)$ in the table, when sample size n gets larger, quantiles of the estimates and the approximation N(0,6/n) are getting closer.

Homework-2019.11.02

Question 6.7

Estimate the power of the skewness test of normality against symmetric $\operatorname{Beta}(\alpha, \alpha)$ distributions and comment on the results. Are the results different for heavy-tailed symmetric alternatives such as $t(\nu)$ ?

Answer

#write a function to compute the sample skewness statistic
sk <- function(x) {
  xbar <- mean(x)
  m3 <- mean((x - xbar)^3)
  m2 <- mean((x - xbar)^2)
  return( m3 / m2^1.5 )
}

beta<-0.1 #significance level
n<-30 #sample size
m<-1000 #number of replicates
alpha<-c(seq(1,10,1)) #parameter of Beta distribution
N<-length(alpha)
pwr<-numeric(N)
#critical value for the skewness test
cv <- qnorm(1-beta/2, 0, sqrt(6*(n-2) / ((n+1)*(n+3))))
# the power of the skewness test
for (j in 1:N) 
   { #for each alpha
    a<-alpha[j]
    sktests <- numeric(m)
    pvalues<-replicate(m,expr = {
      x<-rbeta(n,a,a)
      sktests<-as.integer(abs(sk(x))>= cv)
    })
    pwr[j]<- mean(pvalues) 
  }
pwr
# plot power vs alpha
plot(alpha, pwr, type = "b",xlab = bquote(alpha), ylim = c(0,0.2))
abline(h =0.1, lty = 3)
se <- sqrt(pwr * (1-pwr) / m)  #add standard errors
lines(alpha, pwr+se, lty = 3)
lines(alpha, pwr-se, lty = 3)

From the plot, we can see that powers of the test for each alpha are less than 0.10. With the alpha gets larger, the power is getting closer to 0.10.

for (j in 1:10) { #for each v=alpha
  v<-alpha[j]
  sktests <- numeric(m)
  pvalues<-replicate(m,expr = {
    x<-rt(n,df=v)
    sktests<-as.integer(abs(sk(x))>= cv)
  })
 pwr[j]<- mean(pvalues) 
}
# plot power vs v
plot(alpha, pwr, type = "b",xlab = bquote(v), ylim = c(0,1))
abline(h =0.1, lty = 3)
se <- sqrt(pwr * (1-pwr) / m)  #add standard errors
lines(alpha, pwr+se, lty = 3)
lines(alpha, pwr-se, lty = 3)

From the plot, the power of the test for each $\nu$ is larger than 0.10. The power is getting closer to 0.10 when alpha gets larger. The results are different from symmetric Beta distribution.

Question 6.A

Use Monte Carlo simulation to investigate whether the empirical Type I error rate of the t-test is approximately equal to the nominal significance level $\alpha$, when the sampled population is non-normal. The t-test is robust to mild departures from normality. Discuss the simulation results for the cases where the sampled population is (i)$\chi^{2}(1)$, (ii) Uniform(0,2), and (iii) Exponential(rate=1). In each case, test $H_{0}: \mu=\mu_{0}$ vs $H_{1}: \mu \neq \mu_{0}$, where $\mu_{0}$ is the mean of $\chi^{2}(1)$, Uniform(0,2), and Exponential(1), respectively.

Answer

For this experiment, the significance level is alpha=0.05 and the sample size is n = 30. The means of $\chi^{2}(1)$, Uniform(0,2), and Exponential(1) are all equal to 1. The method I use is similar to Example 6.7 in the textbook.

n<-30
alpha<-0.05 #the nominal significance level
mu0<-1
m<-10000 #number of replicates
p_x<-p_y<-p_z<-numeric(m)
for (j in 1:m) 
  {
   x<-rchisq(n,1)
   y<-runif(n,min = 0,max = 2)
   z<-rexp(n,rate = 1)
   ttest_x <- t.test(x, alternative = "two.sided", mu = mu0) #t test
   ttest_y <- t.test(y, alternative = "two.sided", mu = mu0)
   ttest_z <- t.test(z, alternative = "two.sided", mu = mu0)
   p_x[j] <- ttest_x$p.value
   p_y[j] <- ttest_y$p.value
   p_z[j] <- ttest_z$p.value
  }
px.hat<-mean(p_x<alpha)
py.hat<-mean(p_y<alpha)
pz.hat<-mean(p_z<alpha)
se.x <- sqrt(px.hat * (1 - px.hat) / m) #standard error
se.y <- sqrt(py.hat * (1 - py.hat) / m)
se.z <- sqrt(pz.hat * (1 - pz.hat) / m)
print(c(alpha,px.hat,py.hat,pz.hat,se.x,se.y,se.z))

From the simulation, the empirical Type I error rate of the t-test is r px.hat, r py.hat, r pz.hat for $\chi^{2}(1)$, Uniform(0,2), Exponential(1) respectively. They are all larger than alpha=0.05. For $\chi^{2}(1)$, the empirical Type I error rate differ from alpha=0.05 by more than ten standard error. For Uniform(0,2), the result differs from alpha=0.05 by about one standard error. For Exponential(1), the result differs from alpha=0.05 by more than seven standard error.

Question

If we obtain the powers for two methods under a particular simulation setting with 10000 experiments: say,0.651 for one method and 0.676 for another method. Can we say the powers are different at 0.05 level?

Answer

Homework-2019-11-08

Exercise 7.6

Efron and Tibshirani discuss the scor (bootstrap) test score data on 88 students,who took examinations in five subjects [84, Table 7.1], [188, Table 1.2.1].The first two tests (mechanics, vectors) were closed book and the last three tests (algebra, analysis, statistics) were open book. Each row of the data frame is a set of scores $\left(x_{i 1}, \ldots, x_{i 5}\right)$ for the $i^{t h}$ student. Use a panel display to display the scatter plots for each pair of test scores. Compare the plot with the sample correlation matrix. Obtain bootstrap estimates of the standard errors for each of the following estimates: $\hat{\rho}{12}=\hat{\rho}(\mathrm{mec}, \mathrm{vec})$, $\hat{\rho}{34}=\hat{\rho}(\mathrm{alg}, \mathrm{ana})$, $\hat{\rho}{35}=\hat{\rho}(\mathrm{alg}, \mathrm{sta})$, $\hat{\rho}{45}=\hat{\rho}(\mathrm{ana}, \mathrm{sta})$.

Answer

I use function "pairs" to display the scatter plots for each pair of test scores. The method to estimate the standard errors for each correlation estimate is in the book.

set.seed(123)
library(bootstrap)
#the scatter plots for each pair of test scores
pairs(scor)
#sample correlation matrix
cor(scor)
#bootstrap estimate of SE of estimates of correlations
r<-500 #number of replicates
n<-nrow(scor)
R_12<-R_34<-R_35<-R_45<-numeric(n)
for(j in 1:r){
  i<-sample(1:n,size = n,replace=TRUE)
  mec<-scor$mec[i]
  vec<-scor$vec[i]
  alg<-scor$alg[i]
  ana<-scor$ana[i]
  sta<-scor$sta[i]
  R_12[j]<-cor(mec,vec)
  R_34[j]<-cor(alg,ana)
  R_35[j]<-cor(alg,sta)
  R_45[j]<-cor(ana,sta)
}
print(c(sd(R_12),sd(R_34),sd(R_35),sd(R_45)))

Project 7.B

Repeat Project 7.A for the sample skewness statistic. Compare the coverage rates for normal populations (skewness 0) and $\chi^{2}(5)$ distributions (positive skewness).

Answer

I use the Monte Carlo method in the slides. To compute confidence interval estimates for the skewness, I use the function "boot" and "boot.ci" in the boot package. The code generates 95% confidence intervals for the skewness statistic. The skewness of $\chi^{2}(5)$ distribution is sqrt(8/5).

#the sample skewness statistic
skew.boot <- function(x,i) {
   x<-x[i]
   xbar <- mean(x)
   m3 <- mean((x - xbar)^3)
   m2 <- mean((x - xbar)^2)
   m3/m2^1.5
}

(i) The coverage rates for normal populations

library(boot)
n<-20 #the sample size
m<-1000 #number of replicates
norm.left<-norm.right<-basic.left<- basic.right<-perce.left<-perce.right<-numeric(m)
for(i in 1:m){
   x<-rnorm(n)
   x<-as.matrix(x)
   boot.obj<-boot(x,statistic=skew.boot,R=1000)
   a<-boot.ci(boot.obj, type = c("basic", "norm", "perc"))
   norm.left[i]<-as.integer(a$norm[2]>0)
   norm.right[i]<-as.integer(a$norm[3]<0)
   basic.left[i]<-as.integer(a$basic[4]>0)
   basic.right[i]<-as.integer(a$basic[5]<0)
   perce.left[i]<-as.integer(a$percent[4]>0)
   perce.right[i]<-as.integer(a$percent[5]<0)
}

#proportion of times that confidence intervals miss on the left and right
print(c(mean(norm.left),mean(norm.right),
        mean(basic.left),mean(basic.right),
        mean(perce.left),mean(perce.right)))

#the coverage probabilities  
cat('norm=', 1-mean(norm.left)-mean(norm.right),
     'basic=', 1-mean(basic.left)-mean(basic.right),
      'perc=', 1-mean(perce.left)-mean(perce.right))

(ii) The coverage rates for chi-square populations(df=5)

sk<-sqrt(8/5) 
for(i in 1:m){
  y<-rchisq(n,df=5)
  y<-as.matrix(y)
  boot.obj<-boot(y,statistic=skew.boot,R=1000)
  a<-boot.ci(boot.obj, type = c("basic", "norm", "perc"))
  norm.left[i]<-as.integer(a$norm[2]>sk)
  norm.right[i]<-as.integer(a$norm[3]<sk)
  basic.left[i]<-as.integer(a$basic[4]>sk)
  basic.right[i]<-as.integer(a$basic[5]<sk)
  perce.left[i]<-as.integer(a$percent[4]>sk)
  perce.right[i]<-as.integer(a$percent[5]<sk)
}

#proportion of times that confidence intervals miss on the left and right
print(c(mean(norm.left),mean(norm.right),
        mean(basic.left),mean(basic.right),
        mean(perce.left),mean(perce.right)))

#the coverage probabilities
cat('norm=', 1-mean(norm.left)-mean(norm.right),
     'basic=', 1-mean(basic.left)-mean(basic.right),
      'perc=', 1-mean(perce.left)-mean(perce.right))

Compare the coverage rates for the two distributions:

Homework-2019.11.15

Excecise 7.8

Refer to Exercise 7.7. Obtain the jackknife estimates of bias and standard error of $\hat{\theta}$.

Answer

Use the method in the textbook. The MLE of $\Sigma$ is (n-1)*cov(scor)/n.

data(scor, package = "bootstrap")
n <- nrow(scor)
theta.jack <- numeric(n)
theta.hat <- max(eigen((n-1)*cov(scor)/n)$values)/sum(eigen((n-1)*cov(scor)/n)$values)
for (i in 1:n) 
    theta.jack[i] <- max(eigen((n-2)*cov(scor[-i,])/(n-1))$values) / sum(eigen((n-2)*cov(scor[-i,])/(n-1))$values)
bias.j <- (n - 1) * (mean(theta.jack) - theta.hat) #bias
se.j <- sqrt((n - 1) * mean((theta.jack - mean(theta.jack))^2)) #se
cat("bias =",bias.j,"se =",se.j)

The jackknife estimates of bias and standard error of $\hat{\theta}$ are r bias.j and r se.j.

Excecise 7.10

In Example 7.18, leave-one-out (n-fold) cross validation was used to select the best fitting model. Repeat the analysis replacing the Log-Log model with a cubic polynomial model. Which of the four models is selected by the cross validation procedure? Which model is selected according to maximum adjusted $R^2$?

Answer

Use the method in Example 7.18.

library(DAAG)
attach(ironslag)
n <- length(magnetic) 
e1<-e2<-e3<-e4<-numeric(n)
r1<-r2<-r3<-r4<-numeric(n)

for (k in 1:n) {
y <- magnetic[-k]
x <- chemical[-k]

S1 <- lm(y ~ x) #Linear model
yhat1 <- S1$coef[1] + S1$coef[2] * chemical[k]
e1[k] <- magnetic[k] - yhat1
r1[k] <- summary(S1)$adj.r.squared

S2 <- lm(y ~ x + I(x^2)) #Quadratic model
yhat2 <- S2$coef[1] + S2$coef[2] * chemical[k] +S2$coef[3] * chemical[k]^2
e2[k] <- magnetic[k] - yhat2
r2[k] <- summary(S2)$adj.r.squared

S3 <- lm(log(y) ~ x) #Exponential model
logyhat3 <- S3$coef[1] + S3$coef[2] * chemical[k]
yhat3 <- exp(logyhat3)
e3[k] <- magnetic[k] - yhat3
r3[k] <- summary(S3)$adj.r.squared

S4 <- lm(y ~ x +I(x^2)+ I(x^3)) #Cubic polynomial model
yhat4 <- S4$coef[1] + S4$coef[2] * chemical[k]+S4$coef[3] * chemical[k]^2+S4$coef[4] * chemical[k]^3
e4[k] <- magnetic[k] - yhat4
r4[k] <- summary(S4)$adj.r.squared
}

#the mean of the squared prediction errors
c(mean(e1^2), mean(e2^2), mean(e3^2), mean(e4^2))

#the mean of adjusted R^2 in cross validation procedure
c(mean(r1), mean(r2), mean(r3), mean(r4))

#the adjusted R^2 without cross validation procedure
R1<-summary(lm(magnetic ~ chemical))$adj.r.squared
R2<-summary(lm(magnetic ~ chemical + I(chemical^2)))$adj.r.squared
R3<-summary(lm(log(magnetic) ~ chemical))$adj.r.squared
R4<-summary(lm(magnetic ~ chemical + I(chemical^2) + I(chemical^3)))$adj.r.squared
c(R1,R2,R3,R4)

According to the prediction error criterion, Model 2, the quadratic model is selected by the cross validation procedure. And the quadratic model is also selected according to maximum adjusted $R^2$ whether or not in cross validation procedure.

Homework-2019-11-22

Exercise 8.3

The Count 5 test for equal variances in Section 6.4 is based on the maximum number of extreme points. Example 6.15 shows that the Count 5 criterion is not applicable for unequal sample sizes. Implement a permutation test for equal variance based on the maximum number of extreme points that applies when sample sizes are not necessarily equal.

Answer

Generate X ~ N(1,2) with sample size 30, Y ~ N(0,4) with sample size 50. Generate a function "maxout" to count the maximum number of extreme points. $H_{0}:\sigma_x=\sigma_y$ vs $H_{1}:\sigma_x\not=\sigma_y$. Significance level alpha = 0.05. Use the method in the textbook. Plot the permutation distribution of the maximun number of extreme points and compute the p-value.

set.seed(123)
#The function maxout below counts the maximum number of extreme points
maxout <- function(x, y) {
   X <- x - mean(x)
   Y <- y - mean(y)
   outx <- sum(X > max(Y)) + sum(X < min(Y))
   outy <- sum(Y > max(X)) + sum(Y < min(X))
   return(max(c(outx, outy)))
}
R <- 9999 #number of replicates
x <- rnorm(30, 1, 2)
y <- rnorm(50, 0, 4)
z <- c(x,y)
K <- 1:80
t0 <- maxout(x,y)
rep <- replicate(R,expr={
   #permutations of the sample
   k <- sample(K, size = 30, replace = FALSE)
   x1 <- z[k]
   y1 <- z[-k]
  maxout(x1,y1)
})

#Permutation distribution of the maximun number of extreme points
hist(rep,main = "distribution of maxout", freq=FALSE, xlab="Replicates of z",xlim = c(0,15))
abline(v = t0,col ="red")

#p-value
p<-mean(c(t0, rep) >= t0)
p

The p-value for testing equal variance is r p. So the null hypothesis is rejected under alpha=0.05.

Question

Power comparison (distance correlation test versus ball covariance test)

Answer

Use the function "dCov" and "ndCov2" to compute the test statistic "dcovtest" for distance correlation test. Choose significance level alpha=0.05, compute the powers of two test methods when sample size n=30,45,...,150 for the two models respectively. Then plot the powers vs sample size n.

n<-seq(30,150,15) #the sample size
k<-length(n)
alpha<-0.05
m<-200 #number of replicates
set.seed(12345)
library(boot)
dCov <- function(x, y) {
   x <- as.matrix(x)
   y <- as.matrix(y)
   n <- nrow(x)
   m <- nrow(y)
   if (n != m || n < 2) stop("Sample sizes must agree")
   if (! (all(is.finite(c(x, y)))))
       stop("Data contains missing or infinite values")
   Akl <- function(x) {
     d <- as.matrix(dist(x))
     m <- rowMeans(d)
     M <- mean(d)
     a <- sweep(d, 1, m)
     b <- sweep(a, 2, m)
     return(b + M)
   }
   A <- Akl(x)
   B <- Akl(y)
   dCov <- sqrt(mean(A * B))
   dCov
}
ndCov2 <- function(z, ix, dims) {
   #dims contains dimensions of x and y
   p <- dims[1]
   q1 <- dims[2] + 1
   d <- p + dims[2]
   x <- z[ , 1:p]
   y <- z[ix, q1:d] #permute rows of y
   return(nrow(z) * dCov(x, y)^2)
}
dcovtest<-function(z,dims){
   boot.obj <- boot(data = z, statistic = ndCov2, R = 199, sim = "permutation",dims = c(2, 2))
   tb<-c(boot.obj$t0,boot.obj$t)
   p.value<-mean(tb >= tb[1])
   list(statistic=tb[1],p.value=p.value)
}

library(Ball)
p1<-p2<-numeric(m)
pcov1<-pball1<-numeric(k)
for (j in 1:k) {
   for (i in 1:m) {
       x <- matrix(rnorm(n[j] * 2), nrow = n[j], ncol = 2)
       e <- matrix(rnorm(n[j] * 2), nrow = n[j], ncol = 2)
       y<-x/4+e
       z <- as.data.frame(cbind(x, y))
       p1[i]<-dcovtest(z,dims = c(2,2))$p.value
       p2[i]<-bcov.test(x,y,R=199,seed=i*12345)$p.value
   }
   pcov1[j]<-mean(p1<alpha)
   pball1[j]<-mean(p2<alpha)
}

#plot the powers vs sample number n
plot(n,pcov1,type="o",xlab="n",ylab="powers",ylim=c(0,1),main="Model1",col=1)
lines(n,pball1, type="o", col=2)
legend('bottomright',legend=c('dcov','ball'),col=1:2,lty=1)
pcov2<-pball2<-numeric(k)
for (j in 1:k) {
   for (i in 1:m) {
       x <- matrix(rnorm(n[j] * 2), nrow = n[j], ncol = 2)
       e <- matrix(rnorm(n[j] * 2), nrow = n[j], ncol = 2)
       y<-x*e/4
       z <- as.data.frame(cbind(x, y))
       p1[i]<-dcovtest(z,dims = c(2,2))$p.value
       p2[i]<-bcov.test(x,y,R=199,seed=i*12345)$p.value
   }
   pcov2[j]<-mean(p1<alpha)
   pball2[j]<-mean(p2<alpha)
}
#plot the powers vs sample number n
plot(n,pcov2,type="o",xlab="n",ylab="powers",ylim=c(0,1),main="Model2",col=1)
lines(n,pball2, type="o", col=2)
legend('bottomright',legend=c('dcov','ball'),col=1:2,lty=1)

From a power comparison with 200 test decisions for each of the sample sizes we have obtained the results shown in the Figures.

Homework-2019-11-29

Exercise 9.4

Implement a random walk Metropolis sampler for generating the standard Laplace distribution (see Exercise 3.2). For the increment, simulate from a normal distribution. Compare the chains generated when different variances are used for the proposal distribution. Also, compute the acceptance rates of each chain.

Answer

Use the method in the textbook. The standard Laplace distribution has density f(x) = exp{−|x|}/2, x ∈ R. Choose variance sigma = 0.05,0.5,3,10.

set.seed(1234)
f.lap <- function(x) exp(-abs(x))/2 #density of laplace distribution
#Use a function to gennerate the laplace distribution
rw.MH <- function(sigma, x0, N){
  # sigma: variance of normal distribution
  # x0: initial value
  # N: size of random numbers required
  x <- numeric(N)
  x[1] <- x0
  u <- runif(N)
  k<-0
  for (i in 2:N) {
    y <- rnorm(1,x[i-1],sigma)
    if(u[i] <= (f.lap(y)) / f.lap(x[i-1])) #accept y 
      {
        x[i] <-y
        k <- k+1 #compute acceptance rate
      }
       else {x[i]<-x[i-1]} #reject y
  }
  return(list(x=x, k=k))
}

N <- 2000
sigma <- c(0.05,0.5,3,10)
x0 <- 20
rw1<-rw.MH(sigma[1],x0,N)
rw2<-rw.MH(sigma[2],x0,N)
rw3<-rw.MH(sigma[3],x0,N)
rw4<-rw.MH(sigma[4],x0,N)

#acceptance rates
print(c(rw1$k, rw2$k, rw3$k, rw4$k)/N)

#graphs of chains
#par(mfrow=c(2,2))
rw<-cbind(rw1$x, rw2$x, rw3$x, rw4$x)
for (j in 1:4) 
   plot(rw[,j],type = "l",
        xlab = bquote(sigma== .(round(sigma[j],3))),
        ylab = "x",ylim = range(rw[,j]))

Homework-2019-12-06

Exercise 11.1

The natural logarithm and exponential functions are inverses of each other, so that mathematically log(exp(x)) = exp(log(x)) = x. Show by example that this property does not hold exactly in computer arithmetic. Does the identity hold with near equality? (See all.equal.)

Answer

x<-2:8
#in computer arithmetic
log(exp(x))==exp(log(x))
exp(log(x))==x
log(exp(x))==x
#nearly equality
all.equal(log(exp(x)),exp(log(x)))
all.equal(log(exp(x)),x)
all.equal(x,exp(log(x)))

Through the results, the identity hold with near equality for chosen x.

Exercise 11.5

Write a function to solve the equation $$\begin{aligned} \frac{2 \Gamma\left(\frac{k}{2}\right)}{\sqrt{\pi(k-1)} \Gamma\left(\frac{k-1}{2}\right)} \int_{0}^{c_{k-1}}\left(1+\frac{u^{2}}{k-1}\right)^{-k / 2} d u & \= \frac{2 \Gamma\left(\frac{k+1}{2}\right)}{\sqrt{\pi k} \Gamma\left(\frac{k}{2}\right)} \int_{0}^{c_{k}}\left(1+\frac{u^{2}}{k}\right)^{-(k+1) / 2} d u \end{aligned}$$ for a, where $$c_{k}=\sqrt{\frac{a^{2} k}{k+1-a^{2}}}$$. Compare the solutions with the points A(k) in Exercise 11.4.

Answer

Solve the equation by bisection.

#Exercise 11.4

k<-c(4:25,100)
Ak<-numeric(length(k))

#write a function to compute the roots
bisec<-function(k){
    f<-function(a) {sqrt((k-1)*a^2/(k-a^2))}
    g<-function(a) {sqrt(k*a^2/(k+1-a^2))}
    it <- 0
    eps <- .Machine$double.eps^0.25
    r <- seq(0,sqrt(k)-1e-5, length=3)
    y <- c(pt(f(r[1]),k-1)-pt(g(r[1]),k), pt(f(r[2]),k-1)-pt(g(r[2]),k), 
                                       pt(f(r[3]),k-1)-pt(g(r[3]),k))
    if (y[1] * y[3] > 0)
        stop("f does not have opposite sign at endpoints")
    while(it < 1000&&abs(y[2]>eps)) {
       it <- it + 1
       if (y[1]*y[2] < 0) {
          r[3] <- r[2]
          y[3] <- y[2]
       } 
        else {
          r[1] <- r[2]
          y[1] <- y[2]
       }
       r[2] <- (r[1] + r[3]) / 2
       y[2] <- pt(f(r[2]),k-1)-pt(g(r[2]),k)
    }
    return(r[2])
}

#compute the roots
for(i in 1:length(k))
     Ak[i]<-bisec(k[i])
#Exercise 11.5

k<-c(4:25,100)
ck<-function(a,k){sqrt((a^2)*k/(k+1-a^2))}

#write a function to compute the roots
bisec<-function(k){
    it <- 0
    r <- seq(1e-5,sqrt(k+1)/2, length=3)
    y<-numeric(3)
    while(it < 100) {
         it <- it + 1
         ck11<-ck(r[1],k-1)
         f<-function(u){ck11*(1+(ck11*u)^2/(k-1))^(-k/2)}
         int1<-integrate(f,lower=0,upper =1)$value
         h11<-exp(lgamma(k/2))*int1/(exp(lgamma((k-1))/2)*sqrt(k-1))
         ck12<-ck(r[1],k)
         g<-function(u){ck12*(1+(ck12*u)^2/k)^(-(k+1)/2)}
         int2<-integrate(g,lower=0,upper =1)$value
         h12<-exp(lgamma((k+1)/2))*int2/(exp(lgamma(k/2))*sqrt(k))
         y[1]<-h11-h12
         ck21<-ck(r[2],k-1)
         f<-function(u){ck21*(1+(ck21*u)^2/(k-1))^(-k/2)}
         int1<-integrate(f,lower=0,upper =1)$value
         h21<-exp(lgamma(k/2))*int1/(exp(lgamma((k-1))/2)*sqrt(k-1))
         ck22<-ck(r[2],k)
         g<-function(u){ck22*(1+(ck22*u)^2/k)^(-(k+1)/2)}
         int2<-integrate(g,lower=0,upper =1)$value
         h22<-exp(lgamma((k+1)/2))*int2/(exp(lgamma(k/2))*sqrt(k))
         y[2]<-h21-h22
         ck31<-ck(r[3],k-1)
         f<-function(u){ck31*(1+(ck31*u)^2/(k-1))^(-k/2)}
         int1<-integrate(f,lower=0,upper =1)$value
         h31<-exp(lgamma(k/2))*int1/(exp(lgamma((k-1))/2)*sqrt(k-1))
         ck32<-ck(r[3],k)
         g<-function(u){ck32*(1+(ck32*u)^2/k)^(-(k+1)/2)}
         int2<-integrate(g,lower=0,upper =1)$value
         h32<-exp(lgamma((k+1)/2))*int2/(exp(lgamma(k/2))*sqrt(k))
         y[3]<-h31-h32
         if (y[1]*y[2] < 0) {
                r[3] <- r[2]
                y[3] <- y[2]
         } else {
                r[1] <- r[2]
                y[1] <- y[2]
         }
         r[2] <- (r[1] + r[3]) / 2
   }
   return(r[2])
}
#output the roots
a<-numeric(length(k))
for(i in 1:length(k))
   {a[i]<-bisec(k[i])}

#output ex11.4
solution1 = matrix(0,2,length(k))
rownames(solution1)==c("k","A(k)")
solution1[1,]=k
solution1[2,]=Ak
solution1

#output ex11.5
solution2 = matrix(0,2,length(k))
rownames(solution2)==c("k","a")
solution2[1,]<-k
solution2[2,]<-a
solution2

Compare the results of 11.4 and 11.5, the roots are different but very close.

A-B-O blood type problem

Answer

Use the method in the slides.

#the log-maximum likelihood function
LML <- function(nA,nAB,nB,nO,pA,pB,pO) {
  nAB*log(2*pA*pB)+nA*log(2*pA*pO+pA^2)+nB*log(2*pB*pO+pB^2)+nO*log(pO^2)
}
#write a funtion to solve
pro<-function(p,q,n.obs)
{
  n<-sum(n.obs)
  nA<-n.obs[1]
  nB<-n.obs[2]
  nO<-n.obs[3]
  nAB<-n.obs[4]
  nOO<-nO

  lml<-pA<-pB<-pO<-rep(0,10)
  pA[1]<-p
  pB[1]<-q
  pO[1]<-1-p-q

  for(i in 2:11){
    pA.old<-pA[i-1]
    pB.old<-pB[i-1]
    pO.old<-pO[i-1]

    #compute nAA,nAO,nBB,nBO
    nAA<-nA*pA.old^2/(pA.old^2+2*pA.old*pO.old)
    nAO<-nA-nAA
    nBB<-nB*pB.old^2/(pB.old^2+2*pB.old*pO.old)
    nBO<-nB-nBB

    #M-step
    pA[i]<-(2*nAA+nAO+nAB)/(2*n)
    pB[i]<-(2*nBB+nBO+nAB)/(2*n)
    pO[i]<-(2*nOO+nAO+nBO)/(2*n)
    lml[i-1]<-LML(nA,nAB,nB,nO,pA[i],pB[i],pO[i])
  }
  return(list(pA=pA,pB=pB,pO=pO,lml=lml))
}

n.obs<-c(28,24,41,70) # observed data,n_A,n_B,n_O,n_AB
P<-0.02;Q<-0.01 #initial q
a<-pro(P,Q,n.obs) #compute p,q
p<-a$pA
q<-a$pB
lml<-a$lml

#the outputs of p and q
print(cbind(p,q))

#the log-maximum likelihood values in M-steps
plot(lml,type = "o")

We can see that p=0.3273442, q=0.3104267.

Homework-2019-12-13

Exercise 3(Page 204)

Use both for loops and lapply() to fit linear models to the mtcars using the formulas stored in this list:

formulas <- list( mpg ~ disp, mpg ~ I(1 / disp), mpg ~ disp + wt, mpg ~ I(1 / disp) + wt )

Answer

formulas <- list(
  mpg ~ disp,
  mpg ~ I(1 / disp),
  mpg ~ disp + wt,
  mpg ~ I(1 / disp) + wt
)
fit1<-fit2<-vector("list",4)

#for loops
for (i in 1:4) {
  fit1[[i]]<-lm(formulas[[i]],data = mtcars)
}
fit1

#lapply()
fit2<-lapply(formulas, function(x) lm(x,data = mtcars))
fit2

Exercise 4

Fit the model mpg ~ disp to each of the bootstrap replicates of mtcars in the list below by using a for loop and lapply(). Can you do it without an anonymous function?

bootstraps <- lapply(1:10, function(i) { rows <- sample(1:nrow(mtcars), rep = TRUE) mtcars[rows, ] })

Answer

bootstraps <- lapply(1:10, function(i) {
  rows <- sample(1:nrow(mtcars), rep = TRUE)
  mtcars[rows, ]
})
boot.fit1<-boot.fit2<-boot.fit3<-vector("list",10)

#for loop
for (i in 1:10) {
  boot.fit1[[i]]<-lm(mpg~disp,data = bootstraps[[i]])
}
boot.fit1

#lapply()
boot.fit2<-lapply(1:10,function(i) lm(mpg~disp,data = bootstraps[[i]]))
boot.fit2

#without an anonymous function
boot.fit3<-lapply(lapply(bootstraps,"[",c(1,3)),lm)
boot.fit3

Exercise 5

For each model in the previous two exercises, extract R2 using the function below.

rsq <- function(mod) summary(mod)$r.squared

Answer

rsq <- function(mod) summary(mod)$r.squared
sapply(fit1,rsq)
sapply(fit2,rsq)
sapply(boot.fit1,rsq)
sapply(boot.fit2,rsq)
sapply(boot.fit3,rsq)

Exercise 3(Page 214)

The following code simulates the performance of a t-test for non-normal data. Use sapply() and an anonymous function to extract the p-value from every trial.

trials <- replicate( 100, t.test(rpois(10, 10), rpois(7, 10)), simplify = FALSE )

Extra challenge: get rid of the anonymous function by using [[ directly.

Answer

trials <- replicate(
100,
t.test(rpois(10, 10), rpois(7, 10)),
simplify = FALSE
)

#use an anonymous function
sapply(1:100,function(i) trials[[i]]$p.value)

#extra challenge
sapply(trials,"[[",3)

Exercise 7

Implement mcsapply(), a multicore version of sapply(). Can you implement mcvapply(), a parallel version of vapply()? Why or why not?

Answer

# mcsapply() is based on parLapply(),the multicore version of lapply()
# arguments cl,X,FUN etc. have the same use as parLapply()
# arguments simplify and USE.NAMES have the same use as sapply()
mcsapply<-function (cl = NULL, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE, chunk.size = NULL) 
{
    FUN <- match.fun(FUN)
    answer <- parLapply(cl = cl, X = as.list(X), fun = FUN, ..., chunk.size = chunk.size)
    if (USE.NAMES && is.character(X) && is.null(names(answer))) 
        names(answer) <- X
    if (!isFALSE(simplify) && length(answer)) 
        simplify2array(answer, higher = (simplify == "array"))
    else answer
}

# an example to use mcsapply()
library(parallel)
no_cores<-detectCores()-1
cl<-makeCluster(no_cores)
mcsapply(cl,1:8,function(x) 2*x)
system.time(sapply(1:500000,function(x) 2*x))
system.time(mcsapply(cl,1:500000,function(x) 2*x))

# shut down the workers 
stopCluster(cl)

We can implement mcvapply(). It is already done by the function "future_vapply()" in the R package "future.apply".

Homework-2019-12-20

Question

Answer

# the function I wrote for Ex.9.4
rwmhR <- function(sigma, x0, N){
  # sigma: variance of normal distribution
  # x0: initial value
  # N: size of random numbers required
  x <- numeric(N)
  x[1] <- x0
  u <- runif(N)
  k<-0
  for (i in 2:N) {
    y <- rnorm(1,x[i-1],sigma)
    if(u[i] <= (exp(-abs(y))/2 /exp(-abs(x[i-1]))/2)) #accept y
      {
        x[i] <-y
        k <- k+1 #compute acceptance rate
      }
       else {x[i]<-x[i-1]} #reject y
  }
  return(list(x=x, k=k))
}

# the Rcpp function
library(Rcpp)
sourceCpp(code='
#include<Rcpp.h>
using namespace Rcpp;

//[[Rcpp::export]]
List rwmhC(double sigma, double x0, int N) {
  NumericVector x(N);
  x(0) = x0;
  NumericVector u = as<NumericVector>(runif(N));
  int k=0;
  double y=0;
  for (int i=1;i<N; i++) {
    y = as<double>(rnorm(1,x(i-1),sigma));
    if(u(i) < exp(-abs(y))/2 /exp(-abs(x(i-1))/2)) /** accept y */
      {
        x(i)= y;
        k = k+1;  /** compute acceptance rate*/
      }
       else {x(i)=x(i-1);}  /** reject y*/
  }
  return(List::create(Named("x")=x, Named("k")=k));
}
')


N<-5000
sigma<-c(0.3,3)
x0<-3
rwC1<-rwmhC(sigma[1],x0,N)
rwC2<-rwmhC(sigma[2],x0,N)
#acceptance rates
print(c(rwC1$k,rwC2$k)/N)
#graphs of chains
par(mfrow=c(1,2))
rw<-cbind(rwC1$x, rwC2$x)
for (j in 1:2) 
   plot(rw[,j],type = "l",
        xlab = bquote(sigma== .(round(sigma[j],3))),
        ylab = "x",ylim = range(rw[,j]))
# compare the generated random numbers by qqplot
rwR1<-rwmhR(sigma[1],x0,N)$x
rwR2<-rwmhR(sigma[2],x0,N)$x
qqplot(rwC1$x,rwR1)
qqplot(rwC2$x,rwR2)

# Campare the computation time
library(microbenchmark)
ts1<-microbenchmark(mhR=rwmhR(sigma[1],x0,N),mhC=rwmhC(sigma[1],x0,N))
ts2<-microbenchmark(mhR=rwmhR(sigma[2],x0,N),mhC=rwmhC(sigma[2],x0,N))
summary(ts1)
summary(ts2)

Through the figures, we can see the generated random numbers are nearly positive correlated. Compare the computation times, Rcpp function is much faster than R function.



SC19091/SC19091 documentation built on Jan. 3, 2020, 9:16 p.m.