library(knitr)
library(learnr)
knitr::opts_chunk$set(echo = TRUE, exercise = FALSE)

Hypothesis test and ANOVA

Introduction

We begin by importing the data set, building two datasets (female and male), containing both the biometrical and the social variables.

mydata <- moxier::cleaneddata

mydata.gender <- mydata[, 1]
mydata.biosocial <- mydata[, 3:8]

t1 <- mydata.biosocial[mydata.gender == "Female", ]
t2 <- mydata.biosocial[mydata.gender == "Male", ]

We assess the dimension of the datasets.

(p <- dim(t1)[2])
(n1 <- dim(t1)[1])
(n2 <- dim(t2)[1])

We also make a plot.

plot(rbind(t1, t2), col = c(rep("red", n1), rep("black", n2)))

Multivariate two-sample tests

Adjusted two-sample Z tests for comparing the means (unpaired)

We set the level of significance,

alpha <- 0.05

we choose the null hypothesis,

delta.mean.H0 <- c(0, 0, 0, 0, 0, 0)

compute the three estimates of the differences between means,

(delta.mean.est <- colMeans(t1) - colMeans(t2))

estimate the standard deviations (if unknown).

(sd.est <- sqrt(apply(t1, 2, var) / n1 + apply(t2, 2, var) / n2))

We then proceed to compute the Z test statistics,

(zstat <- (delta.mean.est - delta.mean.H0) / sd.est)

adjust the level of significance (e.g. Bonferroni adjustment),

(alpha.adj <- alpha / p)

compute the boundaries of the rejection region,

(cfr.z <- qnorm(1 - alpha.adj / 2))

check if this leads to the rejection of $H_0$ or not,

(abs(zstat) > cfr.z)

and finally compute the p-value

(pvalue <- ifelse(zstat >= 0, (1 - pnorm(zstat)) * 2, pnorm(zstat) * 2))

and the adjusted p-value.

(pvalue.adj <- pmin(pvalue * p, 1))

We also inspect the adjust two-sample Z intervals for comparing unpaired means.

CI <- cbind(
  inf = delta.mean.est - sd.est * qnorm(1 - alpha.adj / 2),
  center = delta.mean.est,
  sup = delta.mean.est + sd.est * qnorm(1 - alpha.adj / 2)
)
kable(CI)

Global Testing

We now proceed by applying a global testing approach: Hotelling's test.

We select the level of significance,

(alpha <- 0.05)

choose the null hypothesis (a vector in $\mathbb{R}^3$),

(delta.mean.H0 <- c(0, 0, 0, 0, 0, 0))

compute the three estimate of the three differences between means (a vector in $\mathbb{R}^3$),

(delta.mean.est <- colMeans(t1) - colMeans(t2))

estimate the covariance matrix (a matrx in $\mathbb{R}^{3 \times 3}$),

(cov.est <- cov(t1) / n1 + cov(t2) / n2)

compute the $\chi^2$ test statistic,

(chi2stat <- (delta.mean.est - delta.mean.H0) %*% solve(cov.est) %*% (delta.mean.est - delta.mean.H0))

compute the boundary of the elliptical rejection region,

(cfr.chi2 <- qchisq(1 - alpha, p))

check if this leads to the rejection of $H_0$ or not,

(chi2stat > cfr.chi2)

we then compute the p-value.

(pvalue <- 1 - pchisq(chi2stat, p))

We inspect $\chi^2$ confidence regions for comparing the unpaired means (which can't be visualised whenever $p > 3$)-

CI.sim <- cbind(
  inf = delta.mean.est - sd.est * sqrt(qchisq(1 - alpha, p)),
  center = delta.mean.est,
  sup = delta.mean.est + sd.est * sqrt(qchisq(1 - alpha, p))
)
kable(CI.sim)

ANOVA

We now turn our attention to the ANOVA.

We load the data.

iris
summary(iris)

We build the four vector of responses:

and the vector of treatment levels:

response1 <- iris[, 1]
response2 <- iris[, 2]
response3 <- iris[, 1]
response4 <- iris[, 4]

treatment <- iris[, 5]

Dimensions

We then check the sample size,

(n <- length(response1))

the group sample size,

(ng <- table(treatment))

the number of treatments,

(treatment.levels <- levels(treatment))
(g <- length(treatment.levels))

the response to explore.

(response <- response4)

We plot the treatment groups.

(boxplot(response ~ treatment))

Test

We ought to test the equality of the six means of the treatment groups:

We compute the estimates of the $g$ means under the null hypothesis.

rep(mean(response), g)

We compute the estimates of the $g$ means under the alternative hypothesis. Note that these are valide (albeit sub-optimal) estimates even under the null hypothesis.

tapply(response, treatment, mean)

ANOVA

We perform the ANOVA decomposition.

fit <- aov(response ~ treatment)
summary(fit)

We now perform pair-wise comparisons (e.g. Bonferroni). We need to make $\frac{g \left(g - 1\right)}{2}$ comparisons.

mean.est.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels))
zstat.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels))
pvalue.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels))
pvalue.adj.matrix <- matrix(NA, nrow = g, ncol = g, dimnames = list(treatment.levels, treatment.levels))
for (i in 1:(g - 1)) {
  for (j in (i + 1):g) {
    test.ij <- t.test(response[treatment == treatment.levels[i]], response[treatment == treatment.levels[j]])
    mean.est.matrix[i, j] <- diff(test.ij$estimate)
    zstat.matrix[i, j] <- test.ij$statistic
    pvalue.matrix[i, j] <- test.ij$p.value
    pvalue.adj.matrix[i, j] <- min(test.ij$p.value * (g * (g - 1) / 2), 1)
  }
}

We then plot the pairwise estimated differences betwen the means,

mean.est.matrix

the pairwise estimated test statistics,

zstat.matrix

the pairwise p-values

pvalue.matrix

and the pairwise adjusted p-values.

pvalue.adj.matrix * 4
pvalue.adj.matrix * 4 < 0.05

Two-way ANOVA

We load the data.

npk
summary(npk)

We have:

We create factor-specific boxplots of yields.

boxplot(npk[, "yield"] ~ npk[, "block"])
boxplot(npk[, "yield"] ~ npk[, "N"])
boxplot(npk[, "yield"] ~ npk[, "P"])
boxplot(npk[, "yield"] ~ npk[, "K"])

We fit an ANOVA model which is additive with respect to blocks, while it admits pairwise interactions between $N$, $P$ and $K$.

fit <- aov(yield ~ block + N + P + K + N:P + N:K + P:K, npk)

We analyse the results:

summary.aov(fit)

We select a simplified model.

fit1 <- aov(yield ~ block + N + K, npk)
summary(fit1)


mascaretti/moxier documentation built on March 17, 2020, 3:57 p.m.