knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(Intro2MLR)
Power is defined as
$$1-\beta$$
Where $\beta$ is the probability of a type 2 error.
For the case of
$$H_0:\mu = \mu_0$$ and
$$H_1:\mu\ne \mu_0$$
We have developed the formula
$$Z_\beta + Z_{\alpha/2} \approx \frac{\delta\sqrt{n}}{\sigma}$$ where the alternate is specifically $H_1: \mu = \mu_0 +\delta$
Solving problems related to power require some form of graphical presentation. We will look at some basic and not so basic plots which will help picture the various probabilities related to power analysis and hypothesis testing.
The plot created below uses the enhanced $\tt{persp()}$ function and is found in the GA package.
The assumptions used for the determination of the sample size include $\delta = \sigma/4$.
a <- seq(0.1,0.9, length = 100) b <- seq(0.1,0.9, length = 100) n <- function(x,y){ zha <- qnorm(1-x/2) zb <- qnorm(1-y) (4*(zha + zb))^2 } z <-outer(a,b,n) GA::persp3D(a,b,z, theta = 30, phi= 30, expand = 1, main = "n versus alpha and beta")
Often we will want to find the power of a sign test. This reduces to finding the power associated with a binomial experiment.
When dealing with such problems you will need to make appropriate plots.
Make sure you can use informative annotation where calculated statistics are included. See the use of ${\tt substitute() }$
x <- 0:10 p <- dbinom(x, size = 10, prob = 1/2) names(p) <- x coll <- rep(c(2,1), c(7,4)) b <- barplot(p, col = coll) pv <- 1-pbinom(6,10,1/2) pv text(b[8], 0.2, substitute(P(Y >= y)== pv, list(pv = pv)))
When comparing tests (parametric and non-parametric) it is essential to create acceptance and rejection intervals. These are used for testing purposes and they are created with respect to the NULL distribution.
When performing t tests the above regions are straight forward and are made using quantiles determined by the alpha level.
When making such intervals for a NULL binomial distribution we cannot formulate an interval based solely on a given alpha level since the distribution is discrete.
The plot of the NULL distribution below demonstrates the discreteness of the distribution and the consequent discrete tail sums.
#n=10 Bernoulli trials, H0: p =1/2 barplot(p) round(cumsum(p),4) round(1-cumsum(p),4)
When comparing the power of the Binomial (sign) test with the t or similar test it will be best to create the alpha level from the binomial distribution and then use it for the parametric test.
The other issue will be the calculation of the probability of a type 2 error which is defined as $P(Accepting\; the\; NULL\; hypothesis | NULL\; is\; FALSE)$
This will require the construction of RAR regions and then finding the area above $A$ under the assumption that $H_1$ is true.
Another issue is the remaking of the NULL and Alternate hypotheses within the non parametric model. $H_0: \mu = \mu_0$ corresponds to
$$H_0: p =1/2$$ $H_1: \mu=\mu_0 +\delta$ corresponds to
$$H_1: p = p_a$$
To calculate $p_a$ we will need to look at the original scale and distribution of the primary random variable $Y_i$ under $H_0$ and $H_1$.
curve(dnorm(x, 40, 10), xlim = c(40 - 3*10, 40 + 3*10),ylim = c(0, 0.05), ylab = "Density") curve(dnorm(x,40+10/4,10), col = "Blue", add = TRUE) segments(20, 0, 60, 0, lwd = 2) segments(40,0.04, 40,0, lwd =4) segments(40+10/4, 0.04, 40+10/4, 0, lwd = 4) xcurve = seq(40, 70, length =1000) ycurve = dnorm(xcurve, 40+10/4, 10) polygon(c(40,xcurve,70), c(0,ycurve,0), col = rgb(0,0,0.8, 0.3)) barea10<- 1-pnorm(40,50,10) barea10 <- round(barea10,4) barea <- 1-pnorm(40,40+10/4,10) barea <- round(barea,4) text(50, 0.02, paste0("Upper tail=", barea)) text(40+10/4, 0.050, expression(H[1]:mu == mu[0]+ delta),cex = 0.7) text(40, 0.045, expression(H[0]:mu == mu[0]),cex = 0.7) axis(1,40+10/4, "42.5", cex.axis = 0.5)
If $\delta = 10$ then this corresponds to $p_a=r barea10
$
In our case we have $\delta = 10/4$ and therefore $p_a = r round(barea,4)
$
Suppose we make some artificial data:
set.seed(20);y<-rnorm(36,40,10) y
We need RAR regions
p <- dbinom(0:36, size = 36, prob =1/2) names(p) <- 0:36 coll <- rep(c(1,2,1), c(12,13,12)) barplot(p, col = coll) round(cumsum(p),4) round(1-cumsum(p),4) alpha <- sum(p[1:12])*2 alpha
The acceptance region is $[12,24]$
With $\delta = 10/4$ as above we have $p_a=r barea
$.
So the $P({\tt{Type\; 2 \; error}})=\beta$ and therefore the power is calculated thus:
pa <- barea beta = pbinom(24,36,pa)-pbinom(11,36,pa) spower = 1-beta #sign power spower
Notice that we did not need the actual data!!
Now, with $\alpha = r alpha
$ and $\delta = 10/4$ the parametric test gives:
$$Z_{\alpha/2} + Z_{\beta} = \frac{\sqrt{n}\delta}{\sigma}=3/2$$ $$Z_{\beta}= 3/2 - Z_{\alpha/2}$$
Therefore Power = $1-\beta = \tt{pnorm}(Z_{\beta})$
ppower = pnorm(3/2-qnorm(1-alpha/2)) #parametric power ppower
The formula did not require the actual data.
The power for the sign test is r spower
and that of the parametric is r ppower
. There are some obvious questions that need to be asked.
1. Why are both power probabilities low? 2. Why is the parametric test more powerful than the non-parametric?
The first question can be answered by examining how close the NULL and alternate distributions are, With $\delta = \sigma/4$ it will require a large sample size to differentiate the two distributions.
The second question is answered by looking at the quantity of information that is discarded with the sign test. As long as $Y_i> \tau$ the sign test will ascribe the same value (sign) as $Y_j > \tau + 100$. The relative size of the response has been lost.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.