library(knitr) library(learnr) knitr::opts_chunk$set(echo = TRUE, exercise = FALSE)
We begin by importing the data set, building two datasets (female and male), containing both the biometrical and the social variables.
mydata <- moxier::cleaneddata mydata.gender <- mydata[, 1] mydata.biosocial <- mydata[, 3:8] t1 <- mydata.biosocial[mydata.gender == "Female", ] t2 <- mydata.biosocial[mydata.gender == "Male", ]
We assess the dimension of the datasets.
(p <- dim(t1)[2]) (n1 <- dim(t1)[1]) (n2 <- dim(t2)[1])
We also make a plot.
plot(rbind(t1, t2), col = c(rep("red", n1), rep("black", n2)))
We set the level of significance,
alpha <- 0.05
we choose the null hypothesis,
delta.mean.H0 <- c(0, 0, 0, 0, 0, 0)
compute the three estimates of the differences between means,
(delta.mean.est <- colMeans(t1) - colMeans(t2))
estimate the standard deviations (if unknown).
(sd.est <- sqrt(apply(t1, 2, var) / n1 + apply(t2, 2, var) / n2))
We then proceed to compute the Z test statistics,
(zstat <- (delta.mean.est - delta.mean.H0) / sd.est)
adjust the level of significance (e.g. Bonferroni adjustment),
(alpha.adj <- alpha / p)
compute the boundaries of the rejection region,
(cfr.z <- qnorm(1 - alpha.adj / 2))
check if this leads to the rejection of $H_0$ or not,
(abs(zstat) > cfr.z)
and finally compute the p-value
(pvalue <- ifelse(zstat >= 0, (1 - pnorm(zstat)) * 2, pnorm(zstat) * 2))
and the adjusted p-value.
(pvalue.adj <- pmin(pvalue * p, 1))
We also inspect the adjust two-sample Z intervals for comparing unpaired means.
CI <- cbind( inf = delta.mean.est - sd.est * qnorm(1 - alpha.adj / 2), center = delta.mean.est, sup = delta.mean.est + sd.est * qnorm(1 - alpha.adj / 2) ) kable(CI)
We now proceed by applying a global testing approach: Hotelling's test.
We select the level of significance,
(alpha <- 0.05)
choose the null hypothesis (a vector in $\mathbb{R}^3$),
(delta.mean.H0 <- c(0, 0, 0, 0, 0, 0))
compute the three estimate of the three differences between means (a vector in $\mathbb{R}^3$),
(delta.mean.est <- colMeans(t1) - colMeans(t2))
estimate the covariance matrix (a matrx in $\mathbb{R}^{3 \times 3}$),
(cov.est <- cov(t1) / n1 + cov(t2) / n2)
compute the $\chi^2$ test statistic,
(chi2stat <- (delta.mean.est - delta.mean.H0) %*% solve(cov.est) %*% (delta.mean.est - delta.mean.H0))
compute the boundary of the elliptical rejection region,
(cfr.chi2 <- qchisq(1 - alpha, p))
check if this leads to the rejection of $H_0$ or not,
(chi2stat > cfr.chi2)
we then compute the p-value.
(pvalue <- 1 - pchisq(chi2stat, p))
We inspect $\chi^2$ confidence regions for comparing the unpaired means (which can't be visualised whenever $p > 3$)-
CI.sim <- cbind( inf = delta.mean.est - sd.est * sqrt(qchisq(1 - alpha, p)), center = delta.mean.est, sup = delta.mean.est + sd.est * sqrt(qchisq(1 - alpha, p)) ) kable(CI.sim)
We now turn our attention to the ANOVA.
We load the data.
iris
summary(iris)
We build the four vector of responses:
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
and the vector of treatment levels:
Species
response1 <- iris[, 1] response2 <- iris[, 2] response3 <- iris[, 1] response4 <- iris[, 4] treatment <- iris[, 5]
We then check the sample size,
(n <- length(response1))
the group sample size,
(ng <- table(treatment))
the number of treatments,
(treatment.levels <- levels(treatment)) (g <- length(treatment.levels))
the response to explore.
(response <- response4)
We plot the treatment groups.
(boxplot(response ~ treatment))
We ought to test the equality of the six means of the treatment groups:
We compute the estimates of the $g$ means under the null hypothesis.
rep(mean(response), g)
We compute the estimates of the $g$ means under the alternative hypothesis. Note that these are valide (albeit sub-optimal) estimates even under the null hypothesis.
tapply(response, treatment, mean)
We perform the ANOVA decomposition.
fit <- aov(response ~ treatment) summary(fit)
We now perform pair-wise comparisons (e.g. Bonferroni). We need to make $\frac{g \left(g - 1\right)}{2}$ comparisons.
mean.est.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) zstat.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) pvalue.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) pvalue.adj.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels)) for (i in 1:(g - 1)) { for (j in (i + 1):g) { test.ij <- t.test(response[treatment == treatment.levels[i]], response[treatment == treatment.levels[j]]) mean.est.matrix[i, j] <- diff(test.ij$estimate) zstat.matrix[i, j] <- test.ij$statistic pvalue.matrix[i, j] <- test.ij$p.value pvalue.adj.matrix[i, j] <- min(test.ij$p.value * (g * (g - 1) / 2), 1) } }
We then plot the pairwise estimated differences betwen the means,
mean.est.matrix
the pairwise estimated test statistics,
zstat.matrix
the pairwise p-values
pvalue.matrix
and the pairwise adjusted p-values.
pvalue.adj.matrix * 4 pvalue.adj.matrix * 4 < 0.05
We load the data.
npk
summary(npk)
We have:
We create factor-specific boxplots of yields.
boxplot(npk[, "yield"] ~ npk[, "block"]) boxplot(npk[, "yield"] ~ npk[, "N"]) boxplot(npk[, "yield"] ~ npk[, "P"]) boxplot(npk[, "yield"] ~ npk[, "K"])
We fit an ANOVA model which is additive with respect to blocks, while it admits pairwise interactions between $N$, $P$ and $K$.
fit <- aov(yield ~ block + N + P + K + N:P + N:K + P:K, npk)
We analyse the results:
summary.aov(fit)
We select a simplified model.
fit1 <- aov(yield ~ block + N + K, npk) summary(fit1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.