library(knitr) library(learnr) knitr::opts_chunk$set(echo = TRUE, exercise = FALSE)
We import the data
t1 <- moxier::barcellona t2 <- moxier::milano p <- dim(t1)[2] n1 <- dim(t1)[1] n2 <- dim(t2)[1]
and plot them.
plot(rbind(t1, t2), col = c(rep("red", n1), rep("black", n2)))
We select the level of significance
alpha <- 0.01
and the null hypothesis.
delta.mean.H0 <- 0
We then compute the estimate of the difference between the means
(delta.mean.est <- mean(t1[, "Temperatura"]) - mean(t2[, "Temperatura"]))
and estimate the standard deviation (if it is not known)
(sd.est <- sqrt(var(t1[, "Temperatura"]) / n1 + var(t2[, "Temperatura"]) / n2))
We compute the Z-test statistic
(zstat <- (delta.mean.est - delta.mean.H0) / sd.est)
and compute the boundaries of the rejection region.
(cfr.z <- qnorm(1 - alpha / 2))
We check if this leads to the rejection of $H_{0}$ or not
abs(zstat) > cfr.z
and compute the p-value.
(pvalue <- ifelse(zstat >= 0, (1 - pnorm(zstat)) * 2, pnorm(zstat) * 2))
We finally compute the two-sample Z intervals for comparing the means (unpaired)
CI <- c( inf = delta.mean.est - sd.est * qnorm(1 - alpha / 2), center = delta.mean.est, sup = delta.mean.est + sd.est * qnorm(1 - alpha / 2) ) kable(CI)
We consider a component-wise testing approach: an adjusted two-sample Z test for comparing the means (unpaired).
We select the level of signficance,
alpha <- 0.01
the null hypothesism
delta.mean.H0 <- c(0, 0, 0)
we compute three estimates of the three differences of the means.
(delta.mean.est <- colMeans(t1) - colMeans(t2))
After that, we estiamte the standard deviations of the three estimates (if unkwonw),
(sd.est <- sqrt(apply(t1, 2, var) / n1 + apply(t2, 2, var) / n2))
we compute the Z-test statistics,
(zstat <- (delta.mean.est - delta.mean.H0) / sd.est)
adjust the level of signficance (e.g. Bonferroni adjustment);
(alpha.adj <- alpha / p)
we compute the boundaries of the rejection regions,
(cfr.z <- qnorm(1 - alpha.adj / 2))
check if the boundaries lead to the rejection of $H_0$ or not,
abs(zstat) > cfr.z
compute the p-value,
(pvalue <- ifelse(zstat >= 0, (1 - pnorm(zstat)) * 2, pnorm(zstat) * 2))
the adjusted p-value,
(pvalue.adj <- pmin(pvalue * p, 1))
and finally show the adjusted two-sample Z intervals for comparing the means (unpaird) (e.g. Bonferroni intervals)
CI <- cbind( inf = delta.mean.est - sd.est * qnorm(1 - alpha.adj / 2), center = delta.mean.est, sup = delta.mean.est + sd.est * qnorm(1 - alpha.adj / 2) ) kable(CI)
We now approach the issue by using a global testing approach, namely Hotelling's test.
We select the level of signficance,
alpha <- 0.01
choose the null hypothesis (a vector in $\mathbb{R}^3$).
delta.mean.H0 <- c(0, 0, 0)
compute three estimates of the the three differences of the means (again, a vector in $\mathbb{R}^3$),
(delta.mean.est <- colMeans(t1) - colMeans(t2))
we estimate the covariance matrix (a matrix in $\mathbb{R}^{3 \times 3}$).
(cov.est <- cov(t1) / n1 + cov(t2) / n2)
We then compute the $\chi^{2}$-test statistic,
(chi2stat <- (delta.mean.est - delta.mean.H0) %*% solve(cov.est) %*% (delta.mean.est - delta.mean.H0))
compute the boundary of the elliptical rejection region,
(cfr.chi2 <- qchisq(1 - alpha, p))
check if this leads to the rejection of $H_{0}$ or not
chi2stat > cfr.chi2
and compute the p-value.
(pvalue <- 1 - pchisq(chi2stat, p))
Moreover, we show the $\chi^2$ confide:nce region for comparing the means (unpaired). They can't be visualised whenever $p > 3$. We are interested in checking the shades along the three axes (i.e., simultaneous confidence intervals)
CI.sim <- cbind( inf = delta.mean.est - sd.est * sqrt(qchisq(1 - alpha, p)), center = delta.mean.est, sup = delta.mean.est + sd.est * sqrt(qchisq(1 - alpha, p)) ) kable(CI.sim)
We load the data. We have $p = 1$ and $g = 6$.
chickwts
summary(chickwts)
We build the vector of responses (e.g. weight) and the vector of treatment levels (e.g. feed type)
response <- chickwts[, "weight"] treatment <- chickwts[, "feed"]
We compute the sample size,
n <- length(response)
the treatment group sample sizes
ng <- table(treatment)
and the number of treatments.
treatment.levels <- levels(treatment) g <- length(treatment.levels)
We also plot the treatment groups.
boxplot(response ~ treatment)
We wish to test for the equality of the $g$ means of the treatment groups:
We compute the estimates of the $g$ means under the null hypothesis,
rep(mean(response), g)
and we compute the estimates of the six means under the alternative hypothesis. Note that these are valid (albeit sub-optimal) estimates even under the null hypothesis.
tapply(response, treatment, mean)
We perform the ANOVA decomposition.
fit <- aov(response ~ treatment) summary(fit)
And we perform pairwise comparision (e.g. Bonferroni). We need to make $g \dot \frac{g - 1}{2}$ comparisons.
mean.est.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) zstat.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) pvalue.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) pvalue.adj.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) for (i in 1:(g - 1)) { for (j in (i + 1):g) { test.ij <- t.test(response[treatment == treatment.levels[i]], response[treatment == treatment.levels[j]]) mean.est.matrix[i, j] <- diff(test.ij$estimate) zstat.matrix[i, j] <- test.ij$statistic pvalue.matrix[i, j] <- test.ij$p.value pvalue.adj.matrix[i, j] <- min(test.ij$p.value * (g * (g - 1) / 2), 1) } }
We now plot the pairwise estimated differences of the means,
(mean.est.matrix)
the pairwise estimated test statistics,
(zstat.matrix)
the pairwise p-values
(pvalue.matrix)
and the pairwise adjusted p-values.
(pvalue.adj.matrix) pvalue.adj.matrix < 0.05
We compute a two-way ANOVA for the following problem:
We set the variables.
km <- c(18.7, 16.8, 20.1, 22.4, 14.0, 15.2, 22.0, 23.3) gaz.supplier <- factor(c("Esso", "Esso", "Esso", "Esso", "Shell", "Shell", "Shell", "Shell")) gaz.type <- factor(c("95", "95", "98", "98", "95", "95", "98", "98")) gaz.supplier.type <- factor(c("Esso95", "Esso95", "Esso98", "Esso98", "Shell95", "Shell95", "Shell98", "Shell98"))
We estimate the means under the hypothesis of no effect of supplier nor type.
(M <- mean(km))
We estimate the means under the hypothesis of no effect of type only.
(Md.supplier <- tapply(km, gaz.supplier, mean))
We estimate the means under the hypothesis of no effect of supplier only.
(M.type <- tapply(km, gaz.type, mean))
We estimate the means under the hypothesis of an effect of both supplier and type.
(M.supplier.type <- tapply(km, gaz.supplier.type, mean))
And perform the analysis.
summary.aov(aov(km ~ gaz.supplier + gaz.type + gaz.supplier:gaz.type)) summary.aov(aov(km ~ gaz.supplier + gaz.type)) summary.aov(aov(km ~ gaz.type))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.