library(LM2GLMM) knitr::opts_chunk$set(cache = FALSE, fig.align = "center", fig.width = 7, fig.height = 5, cache.path = "./cache_knitr/Exo_LM_intro_solution/", fig.path = "./fig_knitr/Exo_LM_intro_solution/") options(width = 90) set.seed(1L)
If the average IQ for boys were 90 and 110 for girls, what would the parameter values be, for each of the following alternative coding for a boy and a girl (using treatment contrasts):
"boy"
vs "girl"
"male"
vs "female"
0
vs 1
1
vs 2
TRUE
vs FALSE
"boy"
vs "girl"
d <- data.frame(x = c("boy", "girl")) model.matrix(~ x, data = d)
So the intercept represents here the IQ of males and thus should equate 90 and xgirl
represents what needs to be added to the intercept to yield the value for females and thus should equate 20.
"male"
vs "female"
d <- data.frame(x = c("male", "female")) model.matrix(~ x, data = d)
So the intercept represents here the IQ of females and thus should equate 110 and xmale
represents what needs to be added to the intercept to yield the value for males and thus should equate -20.
0
vs 1
d <- data.frame(x = c(0, 1)) model.matrix(~ x, data = d)
So the intercept represents here the IQ of males and thus should equate 90 and x
represents a slope which multiplied by one and added to the intercept should yield the value for males.
That slope should thus equate 20.
1
vs 2
d <- data.frame(x = c(1, 2)) model.matrix(~ x, data = d)
So the intercept represents nothing interesting, but when it is added to 1 times the slope x
the total should equate the IQ of males.
Similarly when the intercept is added to 2 times the slope x
the total should equate the IQ of females.
We thus have:
which you can solve as:
and thus
So here the intercept equates 70 and the slope x
equates 20.
TRUE
vs FALSE
Remember, in R TRUE
and FALSE
are represented by 1 and 0 respectively. If these are used in a model they will be treated as numeric values (1 and 0).
as.numeric(c(TRUE, FALSE))
Therefore, this is essentially the opposite to the question with 0 vs 1
.
d <- data.frame(x = c(TRUE, FALSE)) model.matrix(~ x, data = d)
So the intercept represents here the IQ of females and thus should equate 110 and xTRUE
represents what needs to be added to the intercept to yield the value for males and thus should equate -20.
Are these different representations of ages equivalent?
"baby"
vs "child"
vs "adult"
0
vs 1
vs 2
1
vs 2
vs 3
Let's build all the design matrices:
model.matrix(~ x, data = data.frame(x = c("baby", "child", "adult"))) model.matrix(~ x, data = data.frame(x = c(0, 1, 2))) model.matrix(~ x, data = data.frame(x = c(1, 2, 3)))
No the representations are not equivalent and are actually all different.
Turn the following code into a function that takes the number of aliens to generate as an argument:
Tip: Remember to check ?sample()
to understand what this new function does.
Alien <- data.frame(humans_eaten = sample(1:3, size = 6, replace = TRUE), intercept = 50, slope = 1.5) Alien$error <- rnorm(n = 6, mean = 0, sd = sqrt(5)) Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error
simulate_myAliens <- function(N) { Alien <- data.frame(humans_eaten = sample(1:3, size = N, replace = TRUE), intercept = 50, slope = 1.5) Alien$error <- rnorm(n = N, mean = 0, sd = sqrt(5)) Alien$size <- Alien$intercept + Alien$slope*Alien$humans_eaten + Alien$error Alien } simulate_myAliens(N = 10)
Adapt the following code to include an interaction between sex
and bh_trips
.
set.seed(123) Alien2 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))), sex = factor(rep(c("Z", "ZZ"), times = 3)), bh_trips = 6:1) Alien2$age <- rep(0, nrow(Alien2)) Alien2$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100 Alien2$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000 Alien2$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000 Alien2$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20 Alien2$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20 Alien2$age <- Alien2$age + 0.5*Alien2$bh_trips Alien2$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4) Alien2
set.seed(123) Alien3 <- data.frame(planet = factor(c(rep("Chambr", 2), rep("Riplyx", 2), rep("Wickor", 2))), sex = factor(rep(c("Z", "ZZ"), times = 3)), bh_trips = 6:1) Alien3$age <- rep(0, nrow(Alien2)) Alien3$age[Alien2$planet == "Chambr"] <- Alien2$age[Alien2$planet == "Chambr"] + 100 Alien3$age[Alien2$planet == "Riplyx"] <- Alien2$age[Alien2$planet == "Riplyx"] + 1000 Alien3$age[Alien2$planet == "Wickor"] <- Alien2$age[Alien2$planet == "Wickor"] + 10000 Alien3$age[Alien2$sex == "Z"] <- Alien2$age[Alien2$sex == "Z"] + -20 + 5*Alien2$bh_trips[Alien2$sex == "Z"] Alien3$age[Alien2$sex == "ZZ"] <- Alien2$age[Alien2$sex == "ZZ"] + 20 + 10*Alien2$bh_trips[Alien2$sex == "ZZ"] Alien3$age <- Alien2$age + 0.5*Alien2$bh_trips Alien3$age <- Alien2$age + rnorm(nrow(Alien2), mean = 0, sd = 4) Alien3
We start by fitting this model:
fit_UK1 <- lm(height ~ drink + sex*weight, data = UK)
fit_UK1
Compute the following predictions by (1) creating a design matrix by hand & (2) using the function predict()
:
names(coef(fit_UK1)) newX <- matrix( c(1, 1, 0, 0, 0, 30, 0, 1, 1, 0, 0, 1, 30, 30, 1, 0, 0, 0, 0, 35, 0), nrow = 3, byrow = TRUE) colnames(newX) <- names(coef(fit_UK1)) newX newX %*% coef(fit_UK1)
newdata1 <- data.frame( drink = c("Most days", "Most days", "2 to 3 times a week"), sex = c("Boy", "Girl", "Boy"), weight = c(30, 30, 35)) predict(fit_UK1, newdata = newdata1)
fit_UK1
data.frame(coef(fit_UK1))
What do each of these estimates really mean?
(If you struggle interpreting parameter estimates, look at the fitted values)
(Intercept)
gives the predicted height for an average boy weighing 0 Kg (silly but needed) and whose parents drink 2 to 3 times a week.drinkMost_days
gives by how much the height of an average child increases if their parents drink most days, as compared to the average height of children whose parents drink 2 to 3 times a week.drinkNat_at_all
gives by how much the height of an average child increases if their parents do not drink, as compared to the average height of children whose parents drink 2 to 3 times a week.drinkOce_a_week_or_less
gives by how much the height of an average child increases if their parents drink once a week or less, as compared to the average height of children whose parents drink 2 to 3 times a week.sexGirl
gives how much taller are, on average, girls of 0 Kg compared to boys of 0 Kg.weight
gives by how much the height of an average boy increases when his weight increases by 1 Kg.sexGirl:weight
gives by how much more the height of an average girl increases, compared to the increase in an average boy caused by an increase in weight of 1 Kg.fit_UK1
Create a plot with predictions showing the influence of the weight of girls on their height!
Tip: Look back at slides about predictions to work out how to deal with the effect of 'drink'.
For predictions we will use here the modal value of non-focal qualitative variables within the set of all sampled observations.
#The most frequent value of 'drink' is 'Not at all' table(UK$drink)
Since weight influences predictions linearly, we only need predictions for the lightest and heaviest girl in our drink category.
UK_girl <- UK[which(UK$sex == "Girl" & UK$drink == "Not at all"), ] tab_weight <- data.frame(sex = "Girl", drink = "Not at all", weight = range(UK_girl$weight, na.rm = TRUE)) tab_weight
Now we can add the predicted values:
tab_weight$height_pred <- predict(fit_UK1, newdata = tab_weight) tab_weight
Create a plot to show the data.
#Plot the points plot(x = tab_weight$height_pred, y = tab_weight$weight) #Or we can plot the points and lines lines(x = tab_weight$height_pred, y = tab_weight$weight)
r .emo("nerd")
How would we do this for each category of drink?
So let's start with identifying the relevant weights we need to use:
UK_girl <- UK[which(UK$sex == "Girl"), ] tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE) tab_weight <- data.frame(sex = "Girl", drink = rep(levels(UK_girl$drink), each = 2), weight = unlist(tapply(UK_girl$weight, UK_girl$drink, range, na.rm = TRUE), use.names = FALSE)) tab_weight
Now we can add the predicted values:
tab_weight$height_pred <- predict(fit_UK1, newdata = tab_weight) tab_weight
And we can plot it (here using {ggplot2} for simplicity):
library(ggplot2) ggplot(tab_weight) + aes(x = weight, y = height_pred, colour = drink) + geom_line() + geom_point() + theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.