knitr::opts_chunk$set( collapse = TRUE, comment = "ws#>", fig.align = "center" ) # directory to Lab dirdl <- system.file("Lab11",package = "Intro2R") # create rmd link library(Intro2R)
The creation of analytical confidence intervals and their interpretation will become increasingly important as you move further into statistics.
We must be able to derive the basic intervals from sampling distributions of pivotal statistics. Other interval formulae we will look up in appropriate text books and apply them to our various and particular problems.
There will be essentially two ways in which we will calculate intervals in R. The first way I will loosely call using R as a calculator
and the second using built in specialist functions
.
This technique will help you in making your own functions to carry out the task.
For the case of two independent samples (assumed equal population variances) we can estimate the difference in two population means by using the following formulae
$$S_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}$$
$(1-\alpha)100$% ci for $\mu_1-\mu_2$ is
$$\bar{y}1-\bar{y}_2 \pm t{\alpha/2}S_p\left ( \frac{1}{n_1} + \frac{1}{n_2} \right )^{1/2}$$
where $t_{\alpha/2}=$qt(1-alpha/2, n1+n2-2)
Assume two samples y1
and y2
y1 <- rnorm(30,mean=35, sd =10) y2 <- rnorm(40, mean=25, sd =10) alpha <- 0.05 s1sq <- var(y1) s2sq <- var(y2) n1 <- length(y1) n2 <- length(y2) spsq <- ((n1-1)*s1sq + (n2-1)*s2sq)/(n1+n2-2) mp <- c(-1,1) # make a vector to subtract and then add t <- qt(1-alpha/2, n1+n2-2) ci <- mean(y1)-mean(y2)+ mp*t*sqrt(spsq)*(1/n1 + 1/n2)^(1/2) ci
We will use a side effect of t.test()
t.test(y1,y2, var.equal = TRUE, conf.level = 0.95)$conf.int
We can now look at the case when as before in Example 1 we have two samples but in this case NOT assume that the population variances are the same (use var.test()
to test).
In such cases we use the Welch
form of the two sample test and from it calculate the ci.
The formula for the degrees of freedom is quite different to that of Example 1.
$$ \nu = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$
The $100(1-\alpha)$% ci for $\mu_1-\mu_2$ where population variances are not equal (assuming random sampling from each pop and the two samples are independent (same as in Example 1)) is given by:
$$\bar{Y}1-\bar{Y}_2 \pm t{\alpha/2}\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$$
where $t_{\alpha/2}=$qt(1-alpha/2, nu)
nu = ((s1sq)/n1 +s2sq/n2)^2/((s1sq/n1)^2/(n1-1)+(s2sq/n2)^2/(n2-1)) tnu <- qt(1-alpha/2, nu) ci <- mean(y1)-mean(y2) + mp*tnu*(s1sq/n1 + s2sq/n2)^(1/2) ci
t.test(y1,y2, var.equal = FALSE, conf.level = 0.95)$conf.int
There are a number of functions useful in making confidence intervals for different models. We will not get into many new models but it will be good for you to get the idea of how to calculate intervals and understand the difference in types of intervals.
- Confidence Intervals
- Prediction Intervals
If the model is SLR
$$y_i=\beta_0 + \beta_1 x_i + \epsilon_i$$
$$\epsilon_i \sim N(0,\sigma^2) $$
The following code will make confidence intervals for $\beta_0$ and $\beta_1$.
In fact any linear model that is used to create objects through lm()
can be used in confint()
to construct ci's
ylm <- lm(LENGTH ~ WEIGHT, data = ddt) stats::confint(ylm, c("(Intercept)", "WEIGHT"), level=0.95)
The same generic function confint()
have methods for glm
and nls
as well.
We may want to find confidence intervals for E(Y|x_p)
this can be done using the function predict()
.
Supose xp = c(1005,1020)
and we wish to find the confidence intervals for $E(LENGTH|WEIGHT=xp)$. That is, the confidence interval for the expected length when the weight takes the values xp
predict(ylm, data.frame(WEIGHT = c(1005,1020)), interval = "confidence")
We might ask for a prediction
interval, that is $LENGTH|WEIGHT = x_p$. In this case we are interested in future responses at the given values of $WEIGHT$
predict(ylm, data.frame(WEIGHT = c(1005,1020)), interval = "prediction")
As you perfect your knowledge and skills the final product can be quite impressive.
ylm2 <- lm(LENGTH ~ WEIGHT + I(WEIGHT^2), data = ddt) # a. combine predictions pred <- predict(ylm2, interval = "prediction") dmore <- cbind(ddt, pred)#column bind head(dmore) # b. Quadratic curve with points added library("ggplot2") p <- ggplot(dmore, aes(x= WEIGHT, y = LENGTH)) + geom_point() + stat_smooth(method = lm, formula = y~x+I(x^2)) # c. lower and upper lines added p <- p + geom_line(aes(y = lwr), color = "orange", linetype = "dashed")+ geom_line(aes(y = upr), color = "orange", linetype = "dashed") p + ggtitle("Quadratic curve with prediction intervals")
With this brief inroduction you may now start on the lab
r rmdfiles("Lab11","Intro2R")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.