library(LM2GLMM)
knitr::opts_chunk$set(cache = FALSE,
                      fig.align = "center",
                      fig.width = 7,
                      fig.height = 5,
                      cache.path = "./cache_knitr/Exo_LM_intro_solution/",
                      fig.path = "./fig_knitr/Exo_LM_intro_solution/")
options(width = 90)
set.seed(1L)

Design matrix

Alternative representations of design matrices

Instruction

If the average IQ for boys were 90 and 110 for girls, what would the parameter values be, for each of the following alternative coding for a boy and a girl (using treatment contrasts):

Solution

d <- data.frame(x = c("boy", "girl"))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of males and thus should equate 90 and xgirl represents what needs to be added to the intercept to yield the value for females and thus should equate 20.


d <- data.frame(x = c("male", "female"))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of females and thus should equate 110 and xmale represents what needs to be added to the intercept to yield the value for males and thus should equate -20.


d <- data.frame(x = c(0, 1))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of males and thus should equate 90 and x represents a slope which multiplied by one and added to the intercept should yield the value for males. That slope should thus equate 20.


d <- data.frame(x = c(1, 2))
model.matrix(~ x, data = d)

So the intercept represents nothing interesting, but when it is added to 1 times the slope x the total should equate the IQ of males. Similarly when the intercept is added to 2 times the slope x the total should equate the IQ of females.

We thus have:

$$\begin{cases} i + 1\times x = 90 \ i + 2\times x = 110 \end{cases} $$

which you can solve as:

$$90 - 1\times x + 2\times x = 110 \ x = 20 $$

and thus

$$i + 1\times 20 = 90 \ i = 70$$

So here the intercept equates 70 and the slope x equates 20.


Remember, in R TRUE and FALSE are represented by 1 and 0 respectively. If these are used in a model they will be treated as numeric values (1 and 0).

as.numeric(c(TRUE, FALSE))

Therefore, this is essentially the opposite to the question with 0 vs 1.

d <- data.frame(x = c(TRUE, FALSE))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of females and thus should equate 110 and xTRUE represents what needs to be added to the intercept to yield the value for males and thus should equate -20.

Alternative representations of design matrices

Instruction

Are these different representations of ages equivalent?

Solution

Let's build all the design matrices:

model.matrix(~ x, data = data.frame(x = c("baby", "child", "adult")))
model.matrix(~ x, data = data.frame(x = c(0, 1, 2)))
model.matrix(~ x, data = data.frame(x = c(1, 2, 3)))

No the representations are not equivalent and are actually all different.

Simulating data

Turning script into function

Instruction

Turn the following code into a function that takes the number of aliens to generate as an argument:

Tip: Remember to check ?sample() to understand what this new function does.

Alien <- data.frame(humans_eaten = sample(1:3, size = 6, replace = TRUE), intercept = 50, slope = 1.5)
Alien$error <- rnorm(n = 6, mean = 0, sd = sqrt(5))
Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error

Solution

simulate_myAliens <- function(N) {
  Alien <- data.frame(humans_eaten = sample(1:3, size = N, replace = TRUE),
                      intercept = 50, slope = 1.5)
  Alien$error <- rnorm(n = N, mean = 0, sd = sqrt(5))
  Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error
  Alien
}
simulate_myAliens(N = 10)

Simulating interactions

Instruction

Adapt the following code to include an interaction between sex and bh_trips.

set.seed(123)
Alien2 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))),
                     sex = factor(rep(c("Z", "ZZ"), times = 3)),
                     bh_trips = 6:1)
Alien2$age <- rep(0, nrow(Alien2))
Alien2$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100
Alien2$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000
Alien2$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000
Alien2$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20
Alien2$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20
Alien2$age <- Alien2$age + 0.5*Alien2$bh_trips
Alien2$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4)
Alien2

Solution

set.seed(123)
Alien3 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))),
                     sex = factor(rep(c("Z", "ZZ"), times = 3)),
                     bh_trips = 6:1)
Alien3$age <- rep(0, nrow(Alien2))
Alien3$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100
Alien3$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000
Alien3$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000
Alien3$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20 + 5*Alien2$bh_trips[Alien2$sex == "Z"] 
Alien3$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20 + 10*Alien2$bh_trips[Alien2$sex == "ZZ"]
Alien3$age <- Alien2$age + 0.5*Alien2$bh_trips
Alien3$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4)
Alien3

Model parameter estimates

The UK dataset

We start by fitting this model:

fit_UK1 <- lm(height ~ drink + sex*weight, data = UK)

Predictions with fit_UK1

Instruction

Compute the following predictions by (1) creating a design matrix by hand & (2) using the function predict():

Solution

names(coef(fit_UK1))

newX <- matrix(
  c(1, 1, 0, 0, 0, 30, 0,
    1, 1, 0, 0, 1, 30, 30,
    1, 0, 0, 0, 0, 35, 0), nrow = 3, byrow = TRUE)

colnames(newX) <- names(coef(fit_UK1))

newX

newX %*% coef(fit_UK1)
newdata1 <- data.frame(
  drink  = c("Most days", "Most days", "2 to 3 times a week"),
  sex    = c("Boy", "Girl", "Boy"),
  weight = c(30, 30, 35))
predict(fit_UK1, newdata = newdata1)

Model parameter estimates of fit_UK1

Instruction

data.frame(coef(fit_UK1))

What do each of these estimates really mean?

(If you struggle interpreting parameter estimates, look at the fitted values)

Solution

More predictions with fit_UK1

Instruction

Create a plot with predictions showing the influence of the weight of girls on their height!

Tip: Look back at slides about predictions to work out how to deal with the effect of 'drink'.

Solution

For predictions we will use here the modal value of non-focal qualitative variables within the set of all sampled observations.

#The most frequent value of 'drink' is 'Not at all'
table(UK$drink)

Since weight influences predictions linearly, we only need predictions for the lightest and heaviest girl in our drink category.

UK_girl <- UK[which(UK$sex == "Girl" & UK$drink == "Not at all"), ]

tab_weight <- data.frame(sex = "Girl",
                         drink = "Not at all",
                         weight = range(UK_girl$weight, na.rm = TRUE))

tab_weight

Now we can add the predicted values:

tab_weight$height_pred <- predict(fit_UK1,
                                  newdata = tab_weight)
tab_weight

Create a plot to show the data.

#Plot the points
plot(x = tab_weight$height_pred, y = tab_weight$weight)

#Or we can plot the points and lines
lines(x = tab_weight$height_pred, y = tab_weight$weight)

BONUS Solution r .emo("nerd")

How would we do this for each category of drink?

So let's start with identifying the relevant weights we need to use:

UK_girl <- UK[which(UK$sex == "Girl"), ]

tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE)

tab_weight <- data.frame(sex = "Girl",
                         drink = rep(levels(UK_girl$drink), each = 2),
                         weight = unlist(tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE),
                                         use.names = FALSE))

tab_weight

Now we can add the predicted values:

tab_weight$height_pred <- predict(fit_UK1, newdata = tab_weight)
tab_weight

And we can plot it (here using {ggplot2} for simplicity):

library(ggplot2)
ggplot(tab_weight) +
  aes(x = weight, y = height_pred, colour = drink) +
  geom_line() + 
  geom_point() +
  theme_bw()


courtiol/LM2GLMM documentation built on July 3, 2022, 7:42 a.m.