In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

library(LM2GLMM)
library(lmtest)
knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/LM_assumptions/",
                      fig.path = "./fig_knitr/LM_assumptions/", fig.align = "center", fig.width = 4, fig.asp = 1)
options(width = 200)
set.seed(1L)

The Linear Model: LM

[Back to main menu](./Title.html#2)

You will learn in this session `r .emo("goal")`

the six main assumptions behind LM
how to check if these assumptions are met (by means of tests and using your eyes)
how to do the Box-Cox transformation
that you can manipulate the design matrix to solve problems
how to carefully study outliers
that you will have to learn GLM and (G)LMM to solve some usual problems

Introduction

The main assumptions `r .emo("info")`

Model structure

linearity
lack of perfect multicollinearity (design matrix of full rank)
predictor variables have fixed values

Errors

independence (no serial autocorrelation)
constant variance (homoscedasticity)
normality

Assumptions about the model structure

Linearity

Linearity in brief `r .emo("info")`

Assumption

The mean of the response variable must be a linear combination of the parameters and the predictor variables, with no systematic dependence on any omitted terms.

Causes and consequences of violation

Departure from linearity can originate from a multitude of reasons and can create all kinds of problems.

Solutions

transform one or several predictors (e.g. polynomials)
transform the response (e.g. log and power transformation)

Alternatives

Generalized Linear Models (GLM)
(other) non-linear models

Quiz `r .emo("practice")`

Can you express the following models as LM?

$y_i = \alpha + \epsilon_i$
$y_i = x_i^\beta + \epsilon_i$
$y_i = \alpha + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \epsilon_i$
$y_i = \frac{\beta x_i}{\alpha + x_i} + \epsilon_i$ (Michaelis-Menten : V = Vmax[S]/(Km+[S]))

Lineweaver Burk method for Michaelis-Menten `r .emo("nerd")`

But using this method is not advised as results can be unreliable...

A simple example of non-linearity `r .emo("info")`

set.seed(1L)
Alien0 <- data.frame(humans_eaten = sample(1:100)) 
Alien0$size <- 10 + 50 * Alien0$humans_eaten - 0.02 * (Alien0$humans_eaten^2) + rnorm(100, sd = 5)
mod0a <- lm(size ~ humans_eaten, data = Alien0)
coef(mod0a)

A simple example of non-linearity `r .emo("info")`

plot(size ~ humans_eaten, data = Alien0, pch = 3, col = "red")
abline(mod0a, col = "blue", lwd = 2)

A simple example of non-linearity `r .emo("practice")`

Detection: plot the residuals against each quantitative variable

plot(residuals(mod0a) ~ model.matrix(mod0a)[, "humans_eaten"])
abline(h = 0, col = "red", lty = 2)

A simple example of non-linearity `r .emo("info")`

mod0b <- lm(size ~ poly(humans_eaten, 2, raw = TRUE), data = Alien0)
summary(mod0b)$coef

Note: now that the model formula is correct we find estimates that are sensible.

Another example of non-linearity `r .emo("info")`

poison$treat <- factor(poison$Treatment)
poison$poison <- factor(poison$Poison)
fit_poison <- lm(Time ~ poison + treat, data = poison)
plot(residuals(fit_poison) ~ fitted(fit_poison), xlab = "fitted values", ylab = "residuals")
abline(h = 0, col = "red", lty = 2)

Note: plotting the residuals against the fitted values is a also a good way to find out about the issue.

The Box-Cox transformation (Box & Cox 1964) `r .emo("info")`

it encompasses several classic transformations:

no transformation ($\lambda = 1$)
log transformation ($\lambda = 0$)
inverse transformation ($\lambda = -1$)
square root transformation ($\lambda = 1/2$)
square transformation ($\lambda = 2$)

it can be used irrespective of what your predictors are

E.g. here no quantitative predictors.

but it changes intercept and rescale the $\beta$

The Box-Cox transformation (Box & Cox 1964) `r .emo("practice")`

car::boxCox(fit_poison)  ## makes profile of logLik as a function of lambda

The Box-Cox transformation (Box & Cox 1964) `r .emo("practice")`

summary(bc <- car::powerTransform(fit_poison))
car::testTransform(bc, lambda = -1) ## specifically compare estimated lambda to -1 (inverse transformation)

Note: we will consider -1 instead of r round(bc$lambda[[1]], 2) as it is close enough and easier to interpret!

Poison example linearised `r .emo("practice")`

fit_poison_bc <- update(fit_poison, car::bcPower(Time, lambda = -1) ~ .)
plot(residuals(fit_poison_bc) ~ fitted(fit_poison_bc), xlab = "fitted values", ylab = "residuals")
abline(h = 0, col = "red", lty = 2)

Note: that looks better r .emo("party")

Poison example linearised `r .emo("practice")`

Predictions

data.for.pred <- expand.grid(treat = levels(poison$treat), poison = levels(poison$poison))
(pred <- cbind(data.for.pred, predict(fit_poison_bc, newdata = data.for.pred, interval = "confidence")))

These predicted mean response values are expressed in the Box-Cox scale. In this case, it represents the survival rate (see next slide), but you can always get back to the original scale if you need to:

lambda <- -1; (pred$fit * lambda + 1)^(1/lambda)

Poison example linearised `r .emo("info")`

Relationship between the Box-Cox with $\lambda = -1$ and survival (in discrete time):

With $t$ the time to death and $s$ the survival rate, we have:

$$ \begin{array} \texttt{new.y} & = & \frac{t^{-1} - 1}{-1}\ & = & 1- t^{-1}\ & = & 1- \frac{1}{t}\ & = & s \end{array} $$ Proof r .emo("proof"): if the probability of survival is 0.75, how long do individuals live?

$$t = \frac{1}{1-s} = \frac{1}{1 - 0.75} = \frac{1}{0.25} = 4$$ Or by simulation:

set.seed(123)
mean(replicate(10000, which(rbinom(n = 100, size = 1, prob = 0.25) == 1)[1]))

Comparison of model fits `r .emo("info")`

summary(fit_poison) ## original (mispecified) model

Comparison of model fits `r .emo("info")`

summary(fit_poison_bc) ## BoxCoxed model

Limits of the Box-Cox transformation `r .emo("info")`

it does not always solve the issues (e.g. if count data, GLM should be best)
it does not work in the presence of null or negative values
- a simple option is to transform the response variable following the motif: r data$y <- abs(min(data$y)) + 1
- better transformations have been proposed for that case (see e.g. car::bcnPower())

Lack of perfect multicollinearity

Lack of perfect multicollinearity in brief `r .emo("info")`

Assumption

The design matrix must have full rank. That means that the number of parameters to be estimated must be equal to the number of independent columns (i.e. rank) of the design matrix.

Causes and consequences of violation

Caused by having less data than parameters or by linear dependence between the column vectors of the design matrix. In such case, some parameters cannot be computed.

Solutions

change design matrix (change parameterization or drop redundant effects)
change the experimental design
collect more data

Alternatives

none

Degenerated design matrix: `n < p` `r .emo("info")`

set.seed(1L)
n <- 3
Alien <- data.frame(humans_eaten = 1:n,
                    flowers_eaten = round(runif(n, min = 1, max = 15)),
                    cactus_eaten =  round(runif(n, min = 1, max = 10)))

Alien$size <- rnorm(n = nrow(Alien),
  mean = 50 + 0.2 * Alien$humans_eaten + 0.9 * Alien$flowers_eaten + 0.1 * Alien$cactus_eaten,
  sd = sqrt(25))

fit_alien1a <- lm(size ~  cactus_eaten + humans_eaten + flowers_eaten, data = Alien)
coef(fit_alien1a)
fit_alien1b <- lm(size ~  humans_eaten + flowers_eaten + cactus_eaten, data = Alien)
coef(fit_alien1b)

Degenerated design matrix: trivial redundancy `r .emo("practice")`

set.seed(1L)
Alien2 <- simulate_Aliens()
Alien2$half_humans_eaten <-  0.5 * Alien2$humans_eaten
fit_alien2 <- lm(size ~ humans_eaten + half_humans_eaten, data = Alien2)
coef(fit_alien2)
det(crossprod(model.matrix(fit_alien2)))  ## when det(XTX) <= 0, XTX has no inverse!
fit_alien2$rank  == ncol(model.matrix(fit_alien2))

Degenerated design matrix: miscellaneous `r .emo("practice")`

set.seed(1L)
Alien3 <- data.frame(humans_eaten = 1:12,
                     flowers_eaten = round(runif(12, min = 1, max = 15)),
                     cactus_eaten = 0)
Alien3$food_units <- 1.2*Alien3$humans_eaten + 0.6*Alien3$flowers_eaten
Alien3$size <- rnorm(n = 12, mean = 50 + 1*Alien3$food_units, sd = sqrt(25))
fit_alien3 <- lm(size ~ food_units + humans_eaten + flowers_eaten + cactus_eaten, data = Alien3)
coef(fit_alien3)
caret::findLinearCombos(model.matrix(fit_alien3))  ## Tip: help to see what creates the issue

Non-perfect multicollinearity `r .emo("info")`

Sometimes assumptions are met, but problems can still occur

summary(fit_US  <- lm(Rape ~ Assault + Murder, data = USArrests))$coef
summary(fit_US2 <- lm(Rape ~ Murder, data = USArrests))$coef

Note: it is a problem to infer causal inference, but not so much for predictions

Non-perfect multicollinearity: why? `r .emo("info")`

Because of strong correlation(s) between the regressors:

pairs(USArrests)

Non-perfect multicollinearity: diagnostic `r .emo("practice")`

cor(model.matrix(fit_US))  ## direct measure of correlation in the design matrix
cov2cor(vcov(fit_US))  ## direct measure of correlation between parameter estimates

Note: in more complex models, the numbers do not necessarily match, so it is good practice to check both matrices.

Non-perfect multicollinearity: diagnostic `r .emo("practice")`

Often diagnosed using the Variance Inflation Factor

car::vif(fit_US)
R2 <- summary(lm(Assault ~ Murder, data = USArrests))$r.squared  ## works too if more variables
1/(1 - R2)

Notes:

the model here considers the relationship between the predictors (the response variable of fit_US was rape)
the VIF tells you by how much the variance in the uncertainty in parameter estimates increases due to multicollinearity

Non-perfect multicollinearity: solutions `r .emo("practice")`

drop one variable
merge them

pca <- prcomp(~ Assault + Murder, data = USArrests, scale. = TRUE)
USArrests$PC1 <- pca$x[, 1]
summary(fit_US3 <- lm(Rape ~ PC1, data = USArrests))

Predictor variables have fixed values

Predictor variables have fixed values (in brief) `r .emo("info")`

Assumption

The dependent variable are represented by fixed values.

Causes and consequences of violation

The presence of measurement errors is the main cause of violation. Violation can trigger both estimates and tests to be biased.

Solutions

often ignored in practice
better measurements

Alternatives

multipurpose numerical approaches
errors-in-variables models
reduced major axis regression

Example `r .emo("info")`

set.seed(1L)
Alien4 <- simulate_Aliens(100)
summary(lm(size ~ humans_eaten, data = Alien4))$coef
Alien4$humans_eaten_err <- Alien4$humans_eaten + rnorm(nrow(Alien4), sd = 10)
summary(lm(size ~ humans_eaten_err, data = Alien4))$coef

Accounting for errors-in-variables using `sem` `r .emo("practice")`

Structural equation modelling

Alien5 <- Alien4
Alien5$human_eaten <- NULL ## we remove the original variable
eqns <- sem::specifyEquations(text = "
                        size = alpha*Intercept + slope*humans_eaten
                        humans_eaten = 1*humans_eaten_err
                        V(size) = sigma
                        V(humans_eaten) = 1
                        V(humans_eaten_err) = 1
                        ")
fit_sem <- sem::sem(eqns, data = Alien5, raw = TRUE, fixed.x = "Intercept")
summary(fit_sem, analytic.se = FALSE)$coef  ## use analytic.se = FALSE otherwise considers variance known

Note: other solutions are possible using mixed models.

Assumptions about the errors

Independence

Independence in brief `r .emo("info")`

Assumption

The errors (not the residuals) are uncorrelated: $\text{cov}(\epsilon_i, \epsilon_j) = 0$, with $i \neq j$.

Causes and consequences of violation

A lack of independence (serial autocorrelation) in the errors can appear if there is a departure from linearity, if data have been sampled non-randomly (e.g. spatial or temporal series), or if there is an overarching structure (e.g. repeated measures within individuals, families, species, ...). The lack of independence increases the risk of false positive (sometimes a lot).

Solutions

transformation (see linearity)
aggregation (be careful)
sub-sampling

Alternatives

mixed models (LMM and GLMM)

Testing for independence `r .emo("practice")`

We can use the Durbin-Watson test: D-W [0; 4]

D-W = 2 no-autocorrelation
D-W <<2 positive autocorrelation
D-W >>2 negative autocorrelation

set.seed(1L)
car::durbinWatsonTest(modConv <- lm(fconvict ~ tfr + partic + degrees + mconvict, data = Hartnagel), max.lag = 3)

Testing for independence `r .emo("practice")`

We can also compute the partial autocorrelations for the residuals series,

pacf(residuals(modConv))

Note: mind that the CI plotted here is very approximative.

Testing for independence induced by a specific variable `r .emo("practice")`

lmtest::dwtest(modConv, order.by = modConv$model$degrees)

Note: as for investigating linearity it is good here to try to order by:

each quantitative predictor
the fitted values
nothing (as the data come) -- since it often reflects something about the design that may not be explicitly coded in any variable

Testing for independence by eye `r .emo("practice")`

It is difficult when the problem is not extreme

plot(residuals(modConv) ~ fitted(modConv))
abline(h = 0, lty = 2, col = "red")

Testing for independence by eye `r .emo("practice")`

The origin of the problem here is the time!

plot(residuals(modConv) ~ Hartnagel$year, type = "o")
abline(h = 0, lty = 2, col = "red")

Constant variance (homoscedasticity)

Homos(c/k)edasticity in brief `r .emo("info")`

Assumption

The variance of the error (not residuals) is constant: $\text{var}(\epsilon_j) = \sigma^2$ for all $j$. With matrix notation: if $\epsilon^\text{T}$ is the vector of all $\epsilon_j$, then we assume $\text{cov}(\epsilon, \epsilon) = \sigma^2I_n$, where $I_n$ is the $n \times n$ identity matrix.

Causes and consequences of violation

Heteros(c/k)edasticity can emerge when there is a mean - variance relationship, when there is non independence between observations, when reaction norm changes acording to the treatement. It can create both false positives and false negative.

Solutions

transformation (see linearity)
post-hoc correction of the SE (not so great)

Alternatives

Generalized Linear Models (GLM)
heteroscedastic models

Example of heteroscedasticity `r .emo("practice")`

set.seed(1L)
Alien5 <- simulate_Aliens(N = 100)
Alien5$eggs <- rpois(100, lambda = 2 + 1 * Alien5$humans_eaten) ## data generation is NOT gaussian!
fit_alien5 <- lm(eggs ~ humans_eaten, data = Alien5)
bptest(fit_alien5)

Notes:

the Breusch-Pagan tests test the null hypothesis of homoscedasticity (no matter its cause)
the Breusch-Pagan test follows a $\chi^2$ distribution (thus test statistic = df under H0).

Testing for heteroscedasticity by eye `r .emo("practice")`

Residuals must be standardized as raw residuals always have some (minor) heteroscedasticity.

plot(abs(rstandard(fit_alien5)) ~ fitted(fit_alien5))

Note: here again it is good to plot against all quantitative predictors, fitted values, and as the data come.

Post-hoc correction (not optimal) `r .emo("nerd")`

vcov(fit_alien5)
car::hccm(fit_alien5)  ## correct the covariance matrix of parameter estimates
estimates <- coef(fit_alien5)
std.errors <- sqrt(diag(car::hccm(fit_alien5)))
t.values <- estimates/std.errors
p.values <- 2*pt(abs(t.values), df = fit_alien5$df.residual, lower.tail = FALSE)
cbind(estimates, std.errors, t.values, p.values)

Post-hoc correction (not optimal) `r .emo("nerd")`

Same using Anova:

car::Anova(fit_alien5, white.adjust = TRUE)  ## vcov = hccm
35.2490945224257^2  ## t^2 from previous slide

Normality

Normality in brief `r .emo("info")`

Assumption

The errors (not the residuals) should be normally distributed.

Causes and consequences of violation

The distribution of residuals can be skewed, this is often caused by the presence of outliers, and/or when the process generating the data is very different from normal (e.g. Poisson, Binomial...).

Solutions

transformation (see linearity)
taking outliers out (mindfully!)

Alternatives

robust regressions
GLM

Testing normality `r .emo("practice")`

There are many tests for normality out there...

Example: the Lilliefors (Kolmogorov-Smirnov) test for normality, Shapiro-Wilk Normality Test...

nortest::lillie.test(residuals(fit_poison))   ## stat = 0 when normal
shapiro.test(residuals(fit_poison))  ## stat = 1 when normal

Testing normality by eye `r .emo("practice")`

wzxhzdk:34 wzxhzdk:35

Testing all assumptions on the errors at once `r .emo("party")`

par(mfrow = c(2, 2))
plot(fit_poison)

Outliers

Outliers in brief `r .emo("info")`

What are they?

They are observations that seem not to belong to the others.

Why do they matter?

A few very deviant points can strongly influence all your estimations.

What should you do with them?

It depends... but never trash them blindly.

r .emo("warn") If you have very good reasons to take them out, do mention it in the paper!

Example of outlier `r .emo("practice")`

wzxhzdk:37 wzxhzdk:38

Note: to identify a given point on the plot, call with(Davis, identify(height, weight, row.names(Davis))), then click close to the point and press escape.

There are many ways to identify an outlier `r .emo("info")`

influence.measures(fit_davis)  ## In the next slides we will see all that in details!

Leverage vs Influence `r .emo("info")`

A regression outlier is an observation that has an unusual value of the dependent variable Y, conditional on X.

Regression outliers may not look like an outlier on any Y or X variables.

Leverage

The leverage quantifies how unusual one row of the design matrix is.

A high leverage is not necessarily a bad thing.

Influence

An observation is influential if it strongly influences the predicted values.

A high influence is not necessarily a bad thing.

We do not want high influence caused by high leverage!

Conclusion: we need to look at both!

Measuring the leverage `r .emo("practice")`

This is done by extracting the diagonal element of the hat matrix: the hat values.

Recall: $\widehat{Y} = HY$

range: [1/n; 1]
mean : (n-p)/n

head(sort(hatvalues(fit_davis), decreasing = TRUE))

Note r .emo("nerd"): for some computations and plots the leverage are rescaled as $\frac{h_{i,i}}{1-h_{i,i}}$, but it does not change the reasoning.

Measuring the leverage `r .emo("practice")`

plot(fit_davis, which = 5)

Measuring the influence on predictions `r .emo("practice")`

head(sort(dffits(fit_davis), decreasing = TRUE))
head(sort(cooks.distance(fit_davis), decreasing = TRUE))

Notes:

both the DFFITS statistics and the Cook distance (aka Cook's D) are two relative measures of the extent to which the predicted y-values changes if a given observation is dropped
the DFFITS measure is scaled by the standard deviation of the fit at the point
the Cook's D is based on a F statistics comparing simultaneously the changes in all estimates when the observation is dropped or not.
a high value is obtained when the observation is associated to high leverage, high residual or both.

Measuring the influence on predictions `r .emo("nerd")`

plot(dffits(fit_davis2), cooks.distance(fit_davis2), xlab = "DFFITS", ylab = "Cook's distance")

Measuring the influence on predictions `r .emo("practice")`

par(mfrow = c(1, 3))
plot(fit_davis, which = 4:6)

... another example `r .emo("practice")`

fit_UK <- lm(height ~ sex * milk, data = UK[1:20, ])
par(mfrow = c(1, 3))
plot(fit_UK, which = 4:6)

... another example bis `r .emo("practice")`

fit_UK2 <- lm(height ~ sex * milk, data = UK)
par(mfrow = c(1, 3))
plot(fit_UK2, which = 4:6)

Measuring the influence on each estimate `r .emo("practice")`

head(dfbeta(fit_davis), n = 3)
coef(fit_davis) - coef(update(fit_davis, data = Davis[-1, ]))
head(dfbetas(fit_davis), n = 3) ## same in SE units of the coef

Measuring the influence on the covariance matrix `r .emo("practice")`

head(sort(covratio(fit_davis), decreasing = TRUE))
det(vcov(update(fit_davis, data = Davis[-19, ]))) / det(vcov(fit_davis))

Exploring in depth outliers: all at once `r .emo("practice")`

Note: stars flag candidate outliers, but do check them carefully since it may almost always flag observations even if they are not problematic for your analysis..

influence.measures(fit_davis)  ## stars are just there to attract your attention, there is no proper tests!

Exploring in depth outliers: all at once `r .emo("practice")`

Tip: if your dataset is large, you can select the rows flagged with the stars as follows:

influence_results <- influence.measures(fit_davis)
influence_results$infmat[rowSums(influence_results$is.inf) > 0, , drop = FALSE]

Note: drop = FALSE will avoid the output to be automatically turned into a vector if there is only one flagged outlier. Such a vector would have lost the name of the row and with it the identity of the outlier.

What you need to remember `r .emo("goal")`

the six main assumptions behind LM
how to check if these assumptions are met (by means of tests and using your eyes)
how to do the Box-Cox transformation
that you can manipulate the design matrix to solve problems
how to study carefully outliers
that you will have to learn GLM and (G)LMM to solve some usual problems

The Linear Model: LM

[Back to main menu](./Title.html#2)

courtiol/LM2GLMM documentation built on Oct. 10, 2024, 7:22 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

courtiol/LM2GLMM Data science for biologists: Generalized linear modelling with R

In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

The Linear Model: LM

You will learn in this session r .emo("goal")

Introduction

The main assumptions r .emo("info")

Model structure

Errors

Assumptions about the model structure

Linearity

Linearity in brief r .emo("info")

Assumption

Causes and consequences of violation

Solutions

Alternatives

Quiz r .emo("practice")

Can you express the following models as LM?

Lineweaver Burk method for Michaelis-Menten r .emo("nerd")

A simple example of non-linearity r .emo("info")

A simple example of non-linearity r .emo("info")

A simple example of non-linearity r .emo("practice")

A simple example of non-linearity r .emo("info")

Another example of non-linearity r .emo("info")

The Box-Cox transformation (Box & Cox 1964) r .emo("info")

The Box-Cox transformation (Box & Cox 1964) r .emo("info")

it encompasses several classic transformations:

it can be used irrespective of what your predictors are

but it changes intercept and rescale the $\beta$

The Box-Cox transformation (Box & Cox 1964) r .emo("practice")

The Box-Cox transformation (Box & Cox 1964) r .emo("practice")

Poison example linearised r .emo("practice")

Poison example linearised r .emo("practice")

Predictions

Poison example linearised r .emo("info")

Comparison of model fits r .emo("info")

Comparison of model fits r .emo("info")

Limits of the Box-Cox transformation r .emo("info")

Lack of perfect multicollinearity

Lack of perfect multicollinearity in brief r .emo("info")

Assumption

Causes and consequences of violation

Solutions

Alternatives

Degenerated design matrix: n < p r .emo("info")

Degenerated design matrix: trivial redundancy r .emo("practice")

Degenerated design matrix: miscellaneous r .emo("practice")

Non-perfect multicollinearity r .emo("info")

Sometimes assumptions are met, but problems can still occur

Non-perfect multicollinearity: why? r .emo("info")

Non-perfect multicollinearity: diagnostic r .emo("practice")

Non-perfect multicollinearity: diagnostic r .emo("practice")

Often diagnosed using the Variance Inflation Factor

Non-perfect multicollinearity: solutions r .emo("practice")

Predictor variables have fixed values

Predictor variables have fixed values (in brief) r .emo("info")

Assumption

Causes and consequences of violation

Solutions

Alternatives

Example r .emo("info")

Accounting for errors-in-variables using sem r .emo("practice")

Structural equation modelling

Assumptions about the errors

Independence

Independence in brief r .emo("info")

Assumption

Causes and consequences of violation

Solutions

Alternatives

Testing for independence r .emo("practice")

Testing for independence r .emo("practice")

Testing for independence induced by a specific variable r .emo("practice")

Testing for independence by eye r .emo("practice")

It is difficult when the problem is not extreme

Testing for independence by eye r .emo("practice")

The origin of the problem here is the time!

Constant variance (homoscedasticity)

Homos(c/k)edasticity in brief r .emo("info")

Assumption

Causes and consequences of violation

courtiol/LM2GLMM
Data science for biologists: Generalized linear modelling with R

You will learn in this session `r .emo("goal")`

The main assumptions `r .emo("info")`

Linearity in brief `r .emo("info")`

Quiz `r .emo("practice")`

Lineweaver Burk method for Michaelis-Menten `r .emo("nerd")`

A simple example of non-linearity `r .emo("info")`

A simple example of non-linearity `r .emo("info")`

A simple example of non-linearity `r .emo("practice")`

A simple example of non-linearity `r .emo("info")`

Another example of non-linearity `r .emo("info")`

The Box-Cox transformation (Box & Cox 1964) `r .emo("info")`

The Box-Cox transformation (Box & Cox 1964) `r .emo("info")`

The Box-Cox transformation (Box & Cox 1964) `r .emo("practice")`

The Box-Cox transformation (Box & Cox 1964) `r .emo("practice")`

Poison example linearised `r .emo("practice")`

Poison example linearised `r .emo("practice")`

Poison example linearised `r .emo("info")`

Comparison of model fits `r .emo("info")`

Comparison of model fits `r .emo("info")`

Limits of the Box-Cox transformation `r .emo("info")`

Lack of perfect multicollinearity in brief `r .emo("info")`

Degenerated design matrix: `n < p` `r .emo("info")`

Degenerated design matrix: trivial redundancy `r .emo("practice")`

Degenerated design matrix: miscellaneous `r .emo("practice")`

Non-perfect multicollinearity `r .emo("info")`

Non-perfect multicollinearity: why? `r .emo("info")`

Non-perfect multicollinearity: diagnostic `r .emo("practice")`

Non-perfect multicollinearity: diagnostic `r .emo("practice")`

Non-perfect multicollinearity: solutions `r .emo("practice")`

Predictor variables have fixed values (in brief) `r .emo("info")`

Example `r .emo("info")`

Accounting for errors-in-variables using `sem` `r .emo("practice")`

Independence in brief `r .emo("info")`

Testing for independence `r .emo("practice")`

Testing for independence `r .emo("practice")`

Testing for independence induced by a specific variable `r .emo("practice")`

Testing for independence by eye `r .emo("practice")`

Testing for independence by eye `r .emo("practice")`

Homos(c/k)edasticity in brief `r .emo("info")`

Example of heteroscedasticity `r .emo("practice")`

Testing for heteroscedasticity by eye `r .emo("practice")`

Post-hoc correction (not optimal) `r .emo("nerd")`

Post-hoc correction (not optimal) `r .emo("nerd")`

Normality in brief `r .emo("info")`

Testing normality `r .emo("practice")`

Testing normality by eye `r .emo("practice")`

Testing all assumptions on the errors at once `r .emo("party")`

Outliers in brief `r .emo("info")`

Example of outlier `r .emo("practice")`

There are many ways to identify an outlier `r .emo("info")`

Leverage vs Influence `r .emo("info")`

Measuring the leverage `r .emo("practice")`

Measuring the leverage `r .emo("practice")`

Measuring the influence on predictions `r .emo("practice")`

Measuring the influence on predictions `r .emo("nerd")`

Measuring the influence on predictions `r .emo("practice")`

... another example `r .emo("practice")`

... another example bis `r .emo("practice")`

Measuring the influence on each estimate `r .emo("practice")`

Measuring the influence on the covariance matrix `r .emo("practice")`

Exploring in depth outliers: all at once `r .emo("practice")`

Exploring in depth outliers: all at once `r .emo("practice")`

What you need to remember `r .emo("goal")`