In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

library(LM2GLMM)
knitr::opts_chunk$set(cache = FALSE,
                      fig.align = "center",
                      fig.width = 7,
                      fig.height = 5,
                      cache.path = "./cache_knitr/Exo_LM_intro_solution/",
                      fig.path = "./fig_knitr/Exo_LM_intro_solution/")
options(width = 90)
set.seed(1L)

Design matrix

Alternative representations of design matrices

Instruction

If the average IQ for boys were 90 and 110 for girls, what would the parameter values be, for each of the following alternative coding for a boy and a girl (using treatment contrasts):

"boy" vs "girl"
"male" vs "female"
0 vs 1
1 vs 2
TRUE vs FALSE

Solution

"boy" vs "girl"

d <- data.frame(x = c("boy", "girl"))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of males and thus should equate 90 and xgirl represents what needs to be added to the intercept to yield the value for females and thus should equate 20.

"male" vs "female"

d <- data.frame(x = c("male", "female"))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of females and thus should equate 110 and xmale represents what needs to be added to the intercept to yield the value for males and thus should equate -20.

0 vs 1

d <- data.frame(x = c(0, 1))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of males and thus should equate 90 and x represents a slope which multiplied by one and added to the intercept should yield the value for males. That slope should thus equate 20.

1 vs 2

d <- data.frame(x = c(1, 2))
model.matrix(~ x, data = d)

So the intercept represents nothing interesting, but when it is added to 1 times the slope x the total should equate the IQ of males. Similarly when the intercept is added to 2 times the slope x the total should equate the IQ of females.

We thus have:

$$\begin{cases} i + 1\times x = 90 \ i + 2\times x = 110 \end{cases} $$

which you can solve as:

$$90 - 1\times x + 2\times x = 110 \ x = 20 $$

and thus

$$i + 1\times 20 = 90 \ i = 70$$

So here the intercept equates 70 and the slope x equates 20.

TRUE vs FALSE

Remember, in R TRUE and FALSE are represented by 1 and 0 respectively. If these are used in a model they will be treated as numeric values (1 and 0).

as.numeric(c(TRUE, FALSE))

Therefore, this is essentially the opposite to the question with 0 vs 1.

d <- data.frame(x = c(TRUE, FALSE))
model.matrix(~ x, data = d)

So the intercept represents here the IQ of females and thus should equate 110 and xTRUE represents what needs to be added to the intercept to yield the value for males and thus should equate -20.

Alternative representations of design matrices

Instruction

Are these different representations of ages equivalent?

"baby" vs "child" vs "adult"
0 vs 1 vs 2
1 vs 2 vs 3

Solution

Let's build all the design matrices:

model.matrix(~ x, data = data.frame(x = c("baby", "child", "adult")))
model.matrix(~ x, data = data.frame(x = c(0, 1, 2)))
model.matrix(~ x, data = data.frame(x = c(1, 2, 3)))

No the representations are not equivalent and are actually all different.

Simulating data

Turning script into function

Instruction

Turn the following code into a function that takes the number of aliens to generate as an argument:

Tip: Remember to check ?sample() to understand what this new function does.

Alien <- data.frame(humans_eaten = sample(1:3, size = 6, replace = TRUE), intercept = 50, slope = 1.5)
Alien$error <- rnorm(n = 6, mean = 0, sd = sqrt(5))
Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error

Solution

simulate_myAliens <- function(N) {
  Alien <- data.frame(humans_eaten = sample(1:3, size = N, replace = TRUE),
                      intercept = 50, slope = 1.5)
  Alien$error <- rnorm(n = N, mean = 0, sd = sqrt(5))
  Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error
  Alien
}
simulate_myAliens(N = 10)

Simulating interactions

Instruction

Adapt the following code to include an interaction between sex and bh_trips.

set.seed(123)
Alien2 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))),
                     sex = factor(rep(c("Z", "ZZ"), times = 3)),
                     bh_trips = 6:1)
Alien2$age <- rep(0, nrow(Alien2))
Alien2$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100
Alien2$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000
Alien2$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000
Alien2$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20
Alien2$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20
Alien2$age <- Alien2$age + 0.5*Alien2$bh_trips
Alien2$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4)
Alien2

Solution

set.seed(123)
Alien3 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))),
                     sex = factor(rep(c("Z", "ZZ"), times = 3)),
                     bh_trips = 6:1)
Alien3$age <- rep(0, nrow(Alien2))
Alien3$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100
Alien3$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000
Alien3$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000
Alien3$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20 + 5*Alien2$bh_trips[Alien2$sex == "Z"] 
Alien3$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20 + 10*Alien2$bh_trips[Alien2$sex == "ZZ"]
Alien3$age <- Alien2$age + 0.5*Alien2$bh_trips
Alien3$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4)
Alien3

Model parameter estimates

The UK dataset

We start by fitting this model:

fit_UK1 <- lm(height ~ drink + sex*weight, data = UK)

Predictions with `fit_UK1`

Instruction

Compute the following predictions by (1) creating a design matrix by hand & (2) using the function predict():

height of a 30 Kg boy whose mum drank most days?
height of a 30 Kg girl whose mum drank most days?
height of a 35 Kg boy whose mum drank 2 to 3 times a week?

Solution

names(coef(fit_UK1))

newX <- matrix(
  c(1, 1, 0, 0, 0, 30, 0,
    1, 1, 0, 0, 1, 30, 30,
    1, 0, 0, 0, 0, 35, 0), nrow = 3, byrow = TRUE)

colnames(newX) <- names(coef(fit_UK1))

newX

newX %*% coef(fit_UK1)

newdata1 <- data.frame(
  drink  = c("Most days", "Most days", "2 to 3 times a week"),
  sex    = c("Boy", "Girl", "Boy"),
  weight = c(30, 30, 35))
predict(fit_UK1, newdata = newdata1)

Model parameter estimates of `fit_UK1`

Instruction

data.frame(coef(fit_UK1))

What do each of these estimates really mean?

(If you struggle interpreting parameter estimates, look at the fitted values)

Solution

(Intercept) gives the predicted height for an average boy weighing 0 Kg (silly but needed) and whose parents drink 2 to 3 times a week.
drinkMost_days gives by how much the height of an average child increases if their parents drink most days, as compared to the average height of children whose parents drink 2 to 3 times a week.
drinkNat_at_all gives by how much the height of an average child increases if their parents do not drink, as compared to the average height of children whose parents drink 2 to 3 times a week.
drinkOce_a_week_or_less gives by how much the height of an average child increases if their parents drink once a week or less, as compared to the average height of children whose parents drink 2 to 3 times a week.
sexGirl gives how much taller are, on average, girls of 0 Kg compared to boys of 0 Kg.
weight gives by how much the height of an average boy increases when his weight increases by 1 Kg.
sexGirl:weight gives by how much more the height of an average girl increases, compared to the increase in an average boy caused by an increase in weight of 1 Kg.

More predictions with `fit_UK1`

Instruction

Create a plot with predictions showing the influence of the weight of girls on their height!

Tip: Look back at slides about predictions to work out how to deal with the effect of 'drink'.

Solution

For predictions we will use here the modal value of non-focal qualitative variables within the set of all sampled observations.

#The most frequent value of 'drink' is 'Not at all'
table(UK$drink)

Since weight influences predictions linearly, we only need predictions for the lightest and heaviest girl in our drink category.

UK_girl <- UK[which(UK$sex == "Girl" & UK$drink == "Not at all"), ]

tab_weight <- data.frame(sex = "Girl",
                         drink = "Not at all",
                         weight = range(UK_girl$weight, na.rm = TRUE))

tab_weight

Now we can add the predicted values:

tab_weight$height_pred <- predict(fit_UK1,
                                  newdata = tab_weight)
tab_weight

Create a plot to show the data.

#Plot the points
plot(x = tab_weight$height_pred, y = tab_weight$weight)

#Or we can plot the points and lines
lines(x = tab_weight$height_pred, y = tab_weight$weight)

BONUS Solution `r .emo("nerd")`

How would we do this for each category of drink?

So let's start with identifying the relevant weights we need to use:

UK_girl <- UK[which(UK$sex == "Girl"), ]

tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE)

tab_weight <- data.frame(sex = "Girl",
                         drink = rep(levels(UK_girl$drink), each = 2),
                         weight = unlist(tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE),
                                         use.names = FALSE))

tab_weight

Now we can add the predicted values:

tab_weight$height_pred <- predict(fit_UK1, newdata = tab_weight)
tab_weight

And we can plot it (here using {ggplot2} for simplicity):

library(ggplot2)
ggplot(tab_weight) +
  aes(x = weight, y = height_pred, colour = drink) +
  geom_line() + 
  geom_point() +
  theme_bw()

courtiol/LM2GLMM documentation built on July 3, 2022, 7:42 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

courtiol/LM2GLMM
Data science for biologists: Generalized linear modelling with R

In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

Design matrix

Alternative representations of design matrices

Instruction

Solution

Alternative representations of design matrices

Instruction

Solution

Simulating data

Turning script into function

Instruction

Solution

Simulating interactions

Instruction

Solution

Model parameter estimates

The UK dataset

Predictions with `fit_UK1`

Instruction

Solution

Model parameter estimates of `fit_UK1`

Instruction

Solution

More predictions with `fit_UK1`

Instruction

Solution

BONUS Solution `r .emo("nerd")`

R Package Documentation

Browse R Packages

We want your feedback!

courtiol/LM2GLMM Data science for biologists: Generalized linear modelling with R

In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

Design matrix

Alternative representations of design matrices

Instruction

Solution

Alternative representations of design matrices

Instruction

Solution

Simulating data

Turning script into function

Instruction

Solution

Simulating interactions

Instruction

Solution

Model parameter estimates

The UK dataset

Predictions with fit_UK1

Instruction

Solution

Model parameter estimates of fit_UK1

Instruction

Solution

More predictions with fit_UK1

Instruction

Solution

BONUS Solution r .emo("nerd")

R Package Documentation

Browse R Packages

We want your feedback!

courtiol/LM2GLMM
Data science for biologists: Generalized linear modelling with R

Predictions with `fit_UK1`

Model parameter estimates of `fit_UK1`

More predictions with `fit_UK1`

BONUS Solution `r .emo("nerd")`