knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(Intro2MLR)

Introduction

Power is defined as

$$1-\beta$$

Where $\beta$ is the probability of a type 2 error.

One sample known variance

For the case of

$$H_0:\mu = \mu_0$$ and

$$H_1:\mu\ne \mu_0$$

We have developed the formula

$$Z_\beta + Z_{\alpha/2} \approx \frac{\delta\sqrt{n}}{\sigma}$$ where the alternate is specifically $H_1: \mu = \mu_0 +\delta$

Example

Solving problems related to power require some form of graphical presentation. We will look at some basic and not so basic plots which will help picture the various probabilities related to power analysis and hypothesis testing.

Plots

The plot created below uses the enhanced $\tt{persp()}$ function and is found in the GA package.

The assumptions used for the determination of the sample size include $\delta = \sigma/4$.

a <- seq(0.1,0.9, length = 100)
b <- seq(0.1,0.9, length = 100)

n <- function(x,y){
  zha <- qnorm(1-x/2)
  zb <- qnorm(1-y)
  (4*(zha + zb))^2
}

z <-outer(a,b,n)

GA::persp3D(a,b,z, 
            theta = 30, phi= 30, expand = 1,
            main = "n versus alpha and beta")

Example

Binomial

Often we will want to find the power of a sign test. This reduces to finding the power associated with a binomial experiment.

When dealing with such problems you will need to make appropriate plots.

Make sure you can use informative annotation where calculated statistics are included. See the use of ${\tt substitute() }$

The pvalue

x <- 0:10
p <- dbinom(x, size = 10, prob = 1/2)
names(p) <- x
coll <- rep(c(2,1), c(7,4))
b <- barplot(p, col = coll)
pv <- 1-pbinom(6,10,1/2)
pv
text(b[8], 0.2, substitute(P(Y >= y)== pv, list(pv = pv)))

Binomial testing, rejection and acceptance regions

When comparing tests (parametric and non-parametric) it is essential to create acceptance and rejection intervals. These are used for testing purposes and they are created with respect to the NULL distribution.

When performing t tests the above regions are straight forward and are made using quantiles determined by the alpha level.

When making such intervals for a NULL binomial distribution we cannot formulate an interval based solely on a given alpha level since the distribution is discrete.

The plot of the NULL distribution below demonstrates the discreteness of the distribution and the consequent discrete tail sums.

#n=10 Bernoulli trials, H0: p =1/2

barplot(p)
round(cumsum(p),4)
round(1-cumsum(p),4)

When comparing the power of the Binomial (sign) test with the t or similar test it will be best to create the alpha level from the binomial distribution and then use it for the parametric test.

The other issue will be the calculation of the probability of a type 2 error which is defined as $P(Accepting\; the\; NULL\; hypothesis | NULL\; is\; FALSE)$

This will require the construction of RAR regions and then finding the area above $A$ under the assumption that $H_1$ is true.

Another issue is the remaking of the NULL and Alternate hypotheses within the non parametric model. $H_0: \mu = \mu_0$ corresponds to

$$H_0: p =1/2$$ $H_1: \mu=\mu_0 +\delta$ corresponds to

$$H_1: p = p_a$$

To calculate $p_a$ we will need to look at the original scale and distribution of the primary random variable $Y_i$ under $H_0$ and $H_1$.

curve(dnorm(x, 40, 10), xlim = c(40 - 3*10, 40 + 3*10),ylim = c(0, 0.05), ylab = "Density")
curve(dnorm(x,40+10/4,10), col = "Blue", add = TRUE)
segments(20, 0, 60, 0, lwd = 2)
segments(40,0.04, 40,0, lwd =4)
segments(40+10/4, 0.04, 40+10/4, 0, lwd = 4)
xcurve = seq(40, 70, length =1000)
ycurve = dnorm(xcurve, 40+10/4, 10)
polygon(c(40,xcurve,70), c(0,ycurve,0), col = rgb(0,0,0.8, 0.3))
barea10<- 1-pnorm(40,50,10)
barea10 <- round(barea10,4)
barea <- 1-pnorm(40,40+10/4,10)
barea <- round(barea,4)
text(50, 0.02, paste0("Upper tail=", barea))
text(40+10/4, 0.050, expression(H[1]:mu == mu[0]+ delta),cex = 0.7)
text(40, 0.045, expression(H[0]:mu == mu[0]),cex = 0.7)
axis(1,40+10/4, "42.5", cex.axis = 0.5)

If $\delta = 10$ then this corresponds to $p_a=r barea10$

In our case we have $\delta = 10/4$ and therefore $p_a = r round(barea,4)$

Calulation of power

Sign test (non-parametric)

Suppose we make some artificial data:

set.seed(20);y<-rnorm(36,40,10)
y

We need RAR regions

p <- dbinom(0:36, size = 36, prob =1/2)
names(p) <- 0:36
coll <- rep(c(1,2,1), c(12,13,12))
barplot(p, col = coll)
round(cumsum(p),4)
round(1-cumsum(p),4)
alpha <- sum(p[1:12])*2
alpha

The acceptance region is $[12,24]$

With $\delta = 10/4$ as above we have $p_a=r barea$.

So the $P({\tt{Type\; 2 \; error}})=\beta$ and therefore the power is calculated thus:

pa <- barea
beta = pbinom(24,36,pa)-pbinom(11,36,pa)
spower = 1-beta #sign power
spower

Notice that we did not need the actual data!!

Parametric

Now, with $\alpha = r alpha$ and $\delta = 10/4$ the parametric test gives:

$$Z_{\alpha/2} + Z_{\beta} = \frac{\sqrt{n}\delta}{\sigma}=3/2$$ $$Z_{\beta}= 3/2 - Z_{\alpha/2}$$

Therefore Power = $1-\beta = \tt{pnorm}(Z_{\beta})$

ppower = pnorm(3/2-qnorm(1-alpha/2)) #parametric power
ppower

The formula did not require the actual data.

Conclusion

The power for the sign test is r spower and that of the parametric is r ppower. There are some obvious questions that need to be asked.

1. Why are both power probabilities low?
2. Why is the parametric test more powerful than the non-parametric?

The first question can be answered by examining how close the NULL and alternate distributions are, With $\delta = \sigma/4$ it will require a large sample size to differentiate the two distributions.

The second question is answered by looking at the quantity of information that is discarded with the sign test. As long as $Y_i> \tau$ the sign test will ascribe the same value (sign) as $Y_j > \tau + 100$. The relative size of the response has been lost.



MATHSTATSOU/Intro2MLR documentation built on Dec. 11, 2020, 9:01 p.m.