knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "ws#>",
  fig.align = "center"
)

# directory to Lab
dirdl <- system.file("Lab12",package = "Intro2R")

# create rmd link

library(Intro2R)

Introduction

This lab is related to Lab 11 and will extend the ideas of confidence intervals and re-implement some of the same functions. We will need to understand the ideas behind Null Hypothesis Statistical Testing (NHST).

Interpretation will become very important so please take care to understand the methodology and be aware of some of the short comings of the procedures we teach in class and encounter in our scientific readings.

NHST -- a beginning

To understand the basic ideas related to classical testing we will look at a Binomial experiment.

Suppose we throw a coin $n=10$ times and observe $x=4$ successes. From this data we could ask and answer a number of questions like:

  1. What is the point estimate of $p$, the probability of a success $p=P(Head)$?
  2. What is the interval estimate of $p$?
  3. Make a conjecture for the value of $p$ and then decide whether that value is plausible in light of the data.

Q1. Point estimate

To answer questions 1 and 2 we could employ a number of methods already discussed. For example we could create point estimates for $p$ by using MLE or Moment estimators and for interval estimates we could use analytical ci formulae or bootstrap intervals.

If we used MLE or moment estimation we would obtain

$$\hat{p}=\frac{x}{n}$$

Q2. Confidence interval

You might re-phrase q2 another way -- what are the plausible values of $p$ -- the answer is the confidence interval for $p$.

n <- 10
x <- 4
alpha = 0.05
data <- rep(c(1,0),c(x,n-x))
myp <- function(data, indices){
  d<-data[indices]
  phat<-mean(d) # number of successes/n
}
out<-boot::boot(data, myp,R = 10000)
boot::boot.ci(out,conf = 1-alpha,type = "perc")

Q3. NHST

To answer qu. 3 We will look at the problem from a different perspective. Suppose we assume $p=1-p$, then this means $p=1/2$. We call this the NULL value, it comes about by assuming that the probability of a success is the same as the probability of a failure. This is the skeptics' view.

The skeptic would hold the NULL hypothesis

$$H_0:p=1/2$$

We then can check to see if the data are consitent with this hypothesis

p = dbinom(0:10, size = 10, prob = 1/2)

names(p) = 0:10

coll <- rep(3,11)
barplot(p, col = coll, main = "Distribution of x with p=1/2")

Rejection Acceptance Rejection (RAR) regions

Suppose we make a rejection region $R$ such that if $X\in R$ then we would reject the NULL hypothesis as implausible.

That is, while it is possible that when $p=0.5$ we obtain $X=0,1$ or $X=9,10$ it is NOT likely. We reject the NULL hypothesis understanding that our conclusion could be wrong.

p = dbinom(0:10, size = 10, prob = 1/2)

names(p) = 0:10

index = c(1,2,10,11)

coll = rep(3,11)
coll[index]<-1
coll[5] <- 2
barplot(p, col = coll, 
        main = "Distribution of x with p=1/2\n Rejection regions on x axis, bars in Black",
        xlab = "Number of successes",
        ylab = "Probability")

That is if we reject the NULL when $X\in R$ it is possible we make an error. The probability of rejecting the NULL when it is in fact true ias an error and can be calculated as follows:

$$\alpha=P(X\in { 0,1,9,10}|p=0.5) = r sum(dbinom(c(0,1,9,10),10,0.5))$$

This probability is known as the alpha level or the probability of a type 1 error.

So when $X=4$ we note that the NULL is plausible: $X=4$ lies in the acceptance region (GREEN).

This means that $p=0.5$ is plausible -- we don't know its true value but $X=4$ is consistent with $p=0.5$.

The P value

For this discussion we will continue with the Binomial example we began with above.

Notice that the Rejection region was made through looking at the extreme values of a statistic -- in this case X the number of successes in n=10 trials. In general we may have some other statistic -- like perhaps $T(X)$ which has a sampling distribution.

In the case of our Binomial we are wanting to create a statistic that measures how unlikely the data is in relation to the NULL. A statistic that is commonly used is called the P-Value.

The P-value in our case is the probability of obtaining X=4 or a more extreme value of X given that the NULL is true.

What does it mean to be more extreme? Those values of X which are less likely that X=4. Notice the probabilities:

round(p[p<p[5]],4) # more extreme
round(p[p<=p[5]],4) # as or more extreme

So the P-value would be:

$$P-Value =P(X\le 4| p=0.5) + P(X\ge 6 | p=0.5) =r 2*pbinom(4,10,0.5)$$ We could also calculate this in the following ways:

sum(p[p<=p[5]])
1-p[6]

Notice that if we obtained an X value such that it lay directly on the boundary just prior to the acceptance region then the P-Value would equal $\alpha$.

This gives us the needed comparison.

If X lies in R then the $P-Value \le \alpha$ if X lies in A (the acceptance region) then the $P-Value >\alpha$.

Relooking at the confidence interval for p

The RAR regions allow us to decide whether the NULL is plausible or not. Or, stating it another way, we can use RAR to decide if X=4 is consistent with a given $p$, in this case $p=0.5$.

What if we ask what values of $p$ are consistent with $X=4$. The classical paradigm answers this with a confidence interval

We need to set the alpha level from the RAR (reproduced from above)

$$\alpha=P(X\in { 0,1,9,10}|p=0.5) = r sum(dbinom(c(0,1,9,10),10,0.5))$$

Now we can make a confidence interval for $p$ with the same alpha

n <- 10
x <- 4
alpha = 0.0214844
data <- rep(c(1,0),c(x,n-x))
myp <- function(data, indices){
  d<-data[indices]
  phat<-mean(d) # number of successes/n
}
out<-boot::boot(data, myp,R = 10000)
boot::boot.ci(out,conf = 1-alpha,type = "norm")

Using a Normal method for bootstrap we get a ci (0.0376, 0.7561). This means that $H_0:p=1/2$ is plausible because the NULL value (1/2) is contained withing the plausible interval (the ci).

Testing Rules

We can summarize what we have learnt in the following way.

A Null hypothesis test can be done in the following three ways.

a) RAR, Tcalc (X=4 in the above case) b) Pvalue, alpha (0.0214844 in the above case) c) ci, NULL value (1/2 in the above case)

Power

To reject the NULL hypothesis when it is TRUE is a TYPE 1 error.

$$P(Type\; 1 \; ERROR) = \alpha$$

To accept the NULL hypothesis when it is FALSE is an error called a TYPE 2 error

$$P(Type\; 2\; ERROR) = \beta$$

To REJECT the NULL when it is FALSE is NOT and error and its probability is called $POWER$

$$POWER=1-\beta$$

Having more power means that the test is more sensitive to departures from the NULL.

$$H_0: p = 1/2$$

$$ H_a: p \ne 1/2 $$

Suppose a value of $p$ under $H_a$ is $p=3/4$. Then we can determine the errors and power.

We will use A= accceptance values (2:8) and R = rejection interval (0,1,9,10)

Example

We will now take the Binomial example and this time explore the errors and power discussed above.

p <- dbinom(0:10,size = 10, prob=1/2)
names(p) <- 0:10

pa <- dbinom(0:10,size = 10, prob = 3/4)
names(pa) <- 0:10

coll <- rep(3,11)
colla <- rep(4,11)

h<-barplot(p, col = coll , main = "p=1/2, 3/4", ylim =c(0, 1.2*max(p,pa)))
barplot(pa, col = rgb(1,0,1,0.2), add=TRUE)
lines(h,p)
lines(h,pa)

$\alpha$

$\alpha$ is the probability that we reject the NULL when it is TRUE. Since we know the Rejection interval is $R=(0,1,9,10)$.

So

$$\alpha = P(R|p=1/2)=r sum(dbinom(x=c(0,1,9,10), size = 10, prob = 1/2))$$

$\beta$

$\beta$ is the probability that we accept the NULL given it (the NULL hypothesis) is FALSE.

so

$$\beta = P(A|p=3/4) = r sum(dbinom(x=2:8, size = 10, prob = 3/4))$$

Power = $1-\beta$

Power is defined so that

$$1-\beta= r 1-sum(dbinom(x=2:8, size = 10, prob = 3/4))$$

This is the calculation for the case when the alternate hypothesis has $H_a:p=3/4$.

If we now take $H_a: p = p_a$ where $p_a\in (0,1)$ we get

$$power = 1- P(X\in A|p=p_a)$$

mypow = function(p,A=2:8){
  1-sum(dbinom(A,10,p))
}

pa = as.list(x=seq(0,1, length = 1000))
pow <- purrr::map(pa, ~mypow(.x))
plot(x=pa,
     y=pow, 
     xlim=c(0,1), 
     xlab = "Alternate Probabiity",
     ylab = expression(1-beta),
     main = "Power over values of pa",
     type = "l",
     lwd = 2
     )
abline(v=1/2, lwd =2, col = "Blue")

Summarizing the plot

The plot of power versus $p_a$ shows that as the alternate probability differs from the NULL $H_0: p =1/2$ so the power goes up. The shape will change as the acceptance set changes.

Different A intervals

layout(matrix(1:6,nrow= 2, ncol = 3, byrow = TRUE))
plotp = function(Astar){
mypow = function(p,A=Astar){
  1-sum(dbinom(A,10,p))
}

pa = as.list(x=seq(0,1, length = 1000))
pow <- purrr::map(pa, ~mypow(.x))
plot(x=pa,
     y=pow, 
     xlim=c(0,1), 
     xlab = "Alternate Probabiity",
     ylab = expression(1-beta),
     main = "Power over values of pa",
     type = "l",
     lwd = 2
     )
text(0.5,0.8, paste0("A=",Astar[1], ",...,", Astar[length(Astar)]))
}

plotp(2:8)
plotp(3:7)
plotp(4:6)
plotp(3:8)
plotp(4:8)
plotp(5:8)

The Non Central T distribution

The definition of the central T statistic is given by

$$T = \frac{Z}{\sqrt{\chi^2/\nu}}$$ where $Z\sim N(0,1)$ and $\chi^2\sim Chisq(\nu=n-1)$ and further $Z\perp \chi^2$.

The non-central T statistic is defined as

$$T_{ncp}=\frac{Z + \mu_{ncp}}{\sqrt{\chi^2/\nu}}$$ where $\mu_{ncp}$ is the so called non centrality parameter.

In R we can plot the non central t distribution

curve(
  dt(x,
     ncp=4,
     df = 10,
     ), 
    xlim = c(4-3,4+5),
    main = "The non-central t distribution"
  )

Suppose that we take a simple one sample problem where the population has a mean of $\mu$. We will test the Null hypothesis

$$H_0:\mu=0$$ Versus

$$H_a:\mu\ne 0$$

We will now make the $T$ statistic for the case when $H_0:\mu = 0$

$$T=\frac{\frac{\bar{X}-0}{\sigma/\sqrt{n}}}{\sqrt{\frac{(n-1)s^2}{(n-1)\sigma^2}}}=\frac{\bar{X}}{s/\sqrt{n}}$$

This can be re-written as

$$T = \frac{\frac{(\bar{X}-\mu)}{\sigma/\sqrt{n}}+\frac{\mu}{\sigma/\sqrt{n}}}{\sqrt{\frac{(n-1)s^2}{(n-1)\sigma^2}}} $$ Since the rejection region corresponds to the interval $|T|>t_{\alpha/2}$ we can find the power by calculating the probability of rejecting the NULL given it is false. This is the probability of rejecting the NULL when $H_a:\mu\ne 0$ is true. This will mean that $T=T_{ncp}$. If $H_a:\mu = \mu_a$ then the ncp is $\mu_{ncp}=\frac{\mu_a\sqrt{n}}{\sigma}$ and finally:

$$Power = F_{(n-1, ncp = \sqrt{n}\mu_a/\sigma)}(-t_{\alpha/2}) +1- F_{(n-1, ncp = \sqrt{n}\mu_a/\sigma)}(t_{1-\alpha/2})$$

where $F(x)=P(X\le x)$ known as the cumulative distribution function. We can calculate this probability in R using pt

$$F_{(n-1,ncp)}(x) = \tt{pt(x,ncp,df=n-1)}$$

Example

Suppose

$H_0:\mu = 0$, $H_a:\mu =4$, $\sigma=10$ and $n=11$.

alpha = 0.05
n  = 11
mua = 4
sigma = 10
t = qt(1-alpha/2, n-1)
power = pt(-t, df = n-1, ncp = sqrt(n)*mua/sigma) +
  1 - pt(t, df = n-1, ncp = sqrt(n)*mua/sigma)
power

# The function estimates sigma by sd -- we will do the reverse
out <- power.t.test(n=n,delta = mua,sd = sigma,sig.level = 0.05,type = "one.sample" )
out$power

curve(power.t.test(n=n,delta = x,sd = sigma,sig.level = 0.05,type = "one.sample" )$power,xlim = c(-10,10))

Built in functions to calculate power

These are in the base packages

Compute the power of the two-sample test for proportions, or determine parameters to obtain a target power.

power.prop.test()
power.prop.test(n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05,
                power = NULL,
                alternative = c("two.sided", "one.sided"),
                strict = FALSE, tol = .Machine$double.eps^0.25)

Compute the power of the one-or two-sample t test, or determine parameters to obtain a target power.

power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05,
             power = NULL,
             type = c("two.sample", "one.sample", "paired"),
             alternative = c("two.sided", "one.sided"),
             strict = FALSE, tol = .Machine$double.eps^0.25)

pwr package has more

A good description of the package is given here https://www.statmethods.net/stats/power.html

install.packages("pwr")

All functions for power and sample size analysis in the pwr package begin with pwr. Functions are available for the following statistical tests:

  1. pwr.p.test: one-sample proportion test
  2. pwr.2p.test: two-sample proportion test
  3. pwr.2p2n.test: two-sample proportion test (unequal sample sizes)
  4. pwr.t.test: two-sample, one-sample and paired t-tests
  5. pwr.t2n.test: two-sample t-tests (unequal sample sizes)
  6. pwr.anova.test: one-way balanced ANOVA
  7. pwr.r.test: correlation test
  8. pwr.chisq.test: chi-squared test (goodness of fit and association)
  9. pwr.f2.test: test for the general linear model

Finally

With this brief inroduction you may now start on the lab

r rmdfiles("Lab12","Intro2R")



MATHSTATSOU/Intro2R documentation built on Feb. 20, 2025, 6:18 a.m.