In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

library(LM2GLMM)
set.seed(1L)
knitr::opts_chunk$set(fig.align = "center", fig.width = 4, fig.height = 4, dev.args = list(pointsize = 8))

Table of content

1. General information

2. The exam

3. A catalogue of useful {base} R commands for this course

[Back to main menu](./Title.html#2)

General information

Symbols `r .emo("info")`

I use emojis at the top of each slide to indicate what it refers to:

r .emo("info") for things that are good to know to improve your understanding
r .emo("practice") for practical advice (i.e. things you need to learn by doing)
r .emo("nerd") for nerdy details (i.e. things you don't need to learn, but will deepen your understanding)
r .emo("proof") for demonstrations (i.e. to show you I am not always making things up and teach you how to test your own ideas)
r .emo("alien") for creation of data (i.e. the R objects we will use in other slides)
r .emo("goal") for goals (i.e. highlight the key things you need to remember)

I may also use a few others within the slides

r .emo("party") for solutions or nice results
r .emo("broken") for things that are broken
r .emo("slow") for heavy computations
r .emo("warn") for warnings
r .emo("recap") recap something we covered previously

Course philosophy `r .emo("info")`

By the end of the course, my goal is for you to:

become autonomous with linear modelling (capable of doing a little, capable of learning a lot)
become capable of tackling most challenges brought by the analyses of real dataset

How to get there?

learn some key concepts & some programming (but hardly any mathematics)
we need to dig a bit underneath the surface to understand how R does thing
you need to refrain yourself from compulsive googling and compulsive copy/pasting of the code produced by random people.
you should instead start trying to solve any R/stats problems like any other scientific problem: by combining observations and experiments (in your R console)

What I will not cover in this course:

how to use the {dplyr} and friends ({tidyr}, {tidyselect}, {forcats}...) to put data in shape
which packages can do X or Y in a single click

What do you need for the course? `r .emo("practice")`

a laptop
a computer with internet connection
a web browser (up-to-date)
R version $\geq$ 4.1
RStudio version $\geq$ 2022.01.1

How to access the course? `r .emo("practice")`

The course is an R package called LM2GLMM.

To download and install this package, visit: https://github.com/courtiol/LM2GLMM/

To load the package, do:

library(package = "LM2GLMM") ## !!! Always check the version with me because I will update it often !!!

 #############################################
 #                                           #
 #    This is the package for the course     #
 #                                           #
 #    Advanced Statistical Applications:     #
 #          from LM to GLMM using R          #
 #                                           #
 #       Version XX.XX.XX.X installed!       #
 #                                           #
 #    To access the slides, type either      #
 #   browseVignettes(package = 'LM2GLMM')    #
 #                    or                     #
 #             get_vignettes()               #
 #                                           #
 #############################################

The exam

Procedure `r .emo("info")`

What?

biological questions for which the answers will require you to build linear models (LM or GLM)
you will answer by writing text and R code producing results including figures and/or tables

When?

half a day suiting everyone, once the course will be over
duration for the examination: one hour

(It should be sufficient, but you will be able to stay longer if you need to!)

Condition?

individually
you are free to look at your course and R-help files (no internet)

Grading `r .emo("info")`

I will grade:

the correctness and clarity of the R code
the correctness and clarity of the answers

I will give bonuses for the quality/elegance of:

the writing
the code
the illustrations

I will help you:

but only if you really are stuck

A catalogue of useful {base} R commands for this course

Vectors: basics `r .emo("practice")`

foo <- c(1, 5, 5, 10, 23, NA, NA)
foo[2]
foo > 3
which(foo > 3) ## returns indexes
foo[which(foo > 3)] ## foo[foo > 3] would return the same + NAs

Vectors: basics `r .emo("practice")`

foo2 <- c(a = 1, b = 5, c = 10, d = 23)
foo2["b"]
foo2[c("b", "a")]

foo2[-3] ## remove one item from the vector (same result as foo2[names(foo2) != "c"])

Vectors: some useful functions `r .emo("practice")`

length(foo)
summary(foo)
any(is.na(foo))
table(foo, useNA = "always")

Vectors: classes `r .emo("practice")`

foo1 <- c(1, 5, 5, NA)
foo2 <- c(a = 1, b = 3)
foo3 <- c(TRUE, FALSE)
foo4 <- c("a", "b", "c")
foo5 <- factor(foo3)
foo6 <- c(1L, 3L, 5L)

class(foo1)

classes <- unlist(lapply(as.list(names <- paste0("foo", 1:6)), function(x) class(get(x))))
knitr::kable(rbind(names, classes))

Vectors: factors `r .emo("practice")`

(foo <-  factor(c("a", "b", "c"))) ## Tip: the brackets force the display

relevel(foo, ref = "c")                      ## change first level
factor(foo, levels = c("c", "b", "a"))       ## control order of levels
droplevels(foo[foo != "b"])                  ## drop empty levels
c(factor(c("a", "b")), factor(c("b", "c")))  ## R >= 4.1 can coerce factors with different levels
as.character(foo)                            ## you can turn factors into characters

Vectors: sequences `r .emo("practice")`

1:5
rep(1:5, each = 2)
rep(1:5, times = 2)
seq(from = 1, to = 5, by = 0.5) ## more arguments possible

Vectors: pseudo-random numbers `r .emo("practice")`

rnorm(5, mean = 2, sd = 0.1)  ## Tip: check ?Distributions for other distributions
rnorm(5, mean = 2, sd = 0.1)
set.seed(14353L) # fixing the seed allows for you to get reproducible results
rnorm(5, mean = 2, sd = 0.1)
set.seed(14353L)
rnorm(5, mean = 2, sd = 0.1)

Logical operators `r .emo("practice")`

Return TRUE or FALSE (or NA)

x == y ## is equal to
x != y ## is not equal to
x < y  ## less than
x <= y ## less than or equal to
x > y  ## greater than
x >= y ## greater than or equal to

## compare every item in vector to the same object
foo  <- c(1, 2, 3, NA)
foo <= 2

## compare each set of items in the two vectors
foo2 <- c(0, 3, 2, NA)
foo < foo2

## combine multiple operators 
foo > 0 & foo < 3

as.numeric(c(TRUE, FALSE)) ## TRUE = 1, FALSE = 0

foo3 <- c("a", "b", "a", "a")
sum(foo3 == "a") ## number of items that equal a

Attributes `r .emo("practice")`

foo <- c(a = 1, b = 5, c = 10, d = 23)
attributes(foo)
attr(foo, "names")  ## Tip: here a shortcut would be 'names(foo)', but this is not general
foo <-  factor(c("a", "b", "c"))
attributes(foo)

Dataframes: basics `r .emo("practice")`

foo <- data.frame(
  x = c(1, 3, 5),
  z = factor(c("a", "b", "c"))
  )
foo
dim(foo) ## Tip: try nrow() and ncol()
dimnames(foo) ## Tip: try rownames() and colnames()

Dataframes: indexing `r .emo("practice")`

foo[2, ]    ## get row 2
foo[, 2]    ## get column 2
foo$x       ## get column x
foo[, "x"]  ## get column x

Dataframes: combining rows `r .emo("practice")`

foo2 <- data.frame(
  x = c(2, 4),
  z = factor(c("b", "d"))
  )

rbind(foo, foo2)

Dataframes: combining columns `r .emo("practice")`

bar <- data.frame(
  x = c(1, 2, 3),
  w = c(2, 4, 3),
  s = factor(c("b", "d", "e"))
  )

cbind(foo, bar) ## Note that columns are not merged

Dataframes: {dplyr} <---> {base} translation `r .emo("info")`

library(dplyr)

foo_tbl <- tibble(x = c(1, 3, 5),
                  z = factor(c("a", "b", "c")))
foo_tbl ## the display is different

select(foo_tbl, x)          ## same as foo_tbl[, "x", drop = FALSE]
slice(foo_tbl, 2)           ## same as foo_tbl[2, ]
filter(foo_tbl, x == 3)     ## same as foo_tbl[which(foo_tbl$x == 3), ]

foo_tbl %>% select(x)       ## same as foo_tbl[, "x", drop = FALSE]
foo_tbl %>% slice(2)        ## same as foo_tbl[2, ]
foo_tbl %>% filter(x == 3)  ## same as foo_tbl[which(foo_tbl$x == 3), ]

Dataframes: applying functions on rows or columns `r .emo("practice")`

(d <- data.frame(a = c(1, 2, 3, NA), b = c(NA, 3, 2, 1)))

apply(d, 1, function(x) any(is.na(x))) ## NA in each row

apply(d, 2, function(x) any(is.na(x)))  ## NA in each column

Matrices: basics `r .emo("practice")`

foo <- matrix(data = 1:4, nrow = 2, ncol = 2) ## Tip: try with byrow = TRUE
colnames(foo) <- c("a", "b"); rownames(foo) <- c("A", "B")
foo

Indexing similar to dataframes (but you cannot use $):

foo[, 2]
foo[2, ]

foo[1, 2]

Matrices: transposition, inversion `r .emo("practice")`

foo
t(foo)  ## transpose of foo
solve(foo)  ## inverse of foo

Matrices: multiplication `r .emo("practice")`

foo %*% matrix(c(-1, 1)) ## = [1 * -1 + 3 * 1][2 * -1 + 4 * 1]
foo %*% solve(foo)       ## a matrix times its inverse equals the identity matrix
foo * solve(foo)         ## NOT MATRIX MULTIPLICATION! (element by element multiplication)

Lists `r .emo("practice")`

foo <- list("foo1" = c(1:10), "foo2" = factor(c("a", "b")))
foo
foo[["foo2"]]
foo$foo2

Functions `r .emo("practice")`

addA_B <- function(a = 0, b = 0) { ## setting values for argument makes them default values
  c <- a + b
  return(c)
  }

addA_B(a = 5, b = 7)
addA_B(a = 5)

addA_B_bis <- function(a, b) a + b
addA_B_bis(2, 3)

Repeated operations `r .emo("practice")`

hello <- function(who = "alex") paste("hello", who)

replicate(10, hello())  ## returns a vector when it can

lapply(c("alex", "olivia"), function(i) hello(who = i)) ## returns a list

sapply(c("alex", "olivia"), function(i) hello(who = i)) ## returns a (named) vector when it can

Plots `r .emo("info")`

Plots are necessary to check data (e.g. outliers), model assumptions, and communicate model results.

Let's use the Davis dataset which contains both quantitative (e.g. weight, height) and qualitative (e.g. sex) variables:

str(Davis)

Note: check ?Davis for details on each variable.

Plots: the scatterplot `r .emo("practice")`

For plotting two quantitative variables (work best if continuous, messy if discrete):

plot(repht ~ height, data = Davis)

Note: if one variable is expected to affect the other it should be on the x-axis.

Plots: the scatterplot `r .emo("practice")`

For more advanced scatterplots, you may use the function scatterplot() from the package {car}:

library(car)
scatterplot(height ~ weight + sex, data = Davis[-12, ])

Plots: `pairs()` `r .emo("practice")`

Plot all variables against each others using scatterplots:

pairs(Davis)

Note: qualitative variables are turned into quantitative ones.

Plots: the boxplot `r .emo("practice")`

For plotting the distribution of a quantitative variable (best if continuous) against a qualitative variable:

## Remove outlier identified in scatterplot to see boxplots better (for demonstration only)
plot_data <- Davis[which(Davis$height > 100), ]
boxplot(height ~ sex, data = plot_data)

Note: boxplots show median, inter-quartile range, and potential outliers.

Plots: the conditioning plot `r .emo("practice")`

Generate multiple scatterplots conditional on one other variable:

## Relationship between height and reported height for each sex
coplot(repht ~ height | sex, data = plot_data, panel = panel.smooth)

Plots: the conditioning plot `r .emo("practice")`

Generate multiple scatterplots conditional on two other variables:

## Relationship between height and reported height for different weight groups
coplot(repht ~ height | weight * sex, data = plot_data, panel = panel.smooth)

Note: trellis plots in margins indicate how to match each scatterplot to the weight and the sex.

[Exercises](./Introduction_exercises.html)

[Back to main menu](./Title.html#2)

courtiol/LM2GLMM documentation built on Oct. 10, 2024, 7:22 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

courtiol/LM2GLMM Data science for biologists: Generalized linear modelling with R

In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

Table of content

1. General information

2. The exam

3. A catalogue of useful {base} R commands for this course

General information

Symbols r .emo("info")

Course philosophy r .emo("info")

By the end of the course, my goal is for you to:

How to get there?

What I will not cover in this course:

What do you need for the course? r .emo("practice")

How to access the course? r .emo("practice")

The exam

Procedure r .emo("info")

What?

When?

Condition?

Grading r .emo("info")

I will grade:

I will give bonuses for the quality/elegance of:

I will help you:

A catalogue of useful {base} R commands for this course

Vectors: basics r .emo("practice")

Vectors: basics r .emo("practice")

Vectors: some useful functions r .emo("practice")

Vectors: classes r .emo("practice")

Vectors: factors r .emo("practice")

Vectors: sequences r .emo("practice")

Vectors: pseudo-random numbers r .emo("practice")

Logical operators r .emo("practice")

Attributes r .emo("practice")

Dataframes: basics r .emo("practice")

Dataframes: indexing r .emo("practice")

Dataframes: combining rows r .emo("practice")

Dataframes: combining columns r .emo("practice")

Dataframes: {dplyr} <---> {base} translation r .emo("info")

Dataframes: applying functions on rows or columns r .emo("practice")

Matrices: basics r .emo("practice")

Matrices: transposition, inversion r .emo("practice")

Matrices: multiplication r .emo("practice")

Lists r .emo("practice")

Functions r .emo("practice")

Repeated operations r .emo("practice")

Plots r .emo("info")

Plots: the scatterplot r .emo("practice")

Plots: the scatterplot r .emo("practice")

Plots: pairs() r .emo("practice")

Plots: the boxplot r .emo("practice")

Plots: the conditioning plot r .emo("practice")

Plots: the conditioning plot r .emo("practice")

R Package Documentation

Browse R Packages

We want your feedback!

courtiol/LM2GLMM
Data science for biologists: Generalized linear modelling with R

Symbols `r .emo("info")`

Course philosophy `r .emo("info")`

What do you need for the course? `r .emo("practice")`

How to access the course? `r .emo("practice")`

Procedure `r .emo("info")`

Grading `r .emo("info")`

Vectors: basics `r .emo("practice")`

Vectors: basics `r .emo("practice")`

Vectors: some useful functions `r .emo("practice")`

Vectors: classes `r .emo("practice")`

Vectors: factors `r .emo("practice")`

Vectors: sequences `r .emo("practice")`

Vectors: pseudo-random numbers `r .emo("practice")`

Logical operators `r .emo("practice")`

Attributes `r .emo("practice")`

Dataframes: basics `r .emo("practice")`

Dataframes: indexing `r .emo("practice")`

Dataframes: combining rows `r .emo("practice")`

Dataframes: combining columns `r .emo("practice")`

Dataframes: {dplyr} <---> {base} translation `r .emo("info")`

Dataframes: applying functions on rows or columns `r .emo("practice")`

Matrices: basics `r .emo("practice")`

Matrices: transposition, inversion `r .emo("practice")`

Matrices: multiplication `r .emo("practice")`

Lists `r .emo("practice")`

Functions `r .emo("practice")`

Repeated operations `r .emo("practice")`

Plots `r .emo("info")`

Plots: the scatterplot `r .emo("practice")`

Plots: the scatterplot `r .emo("practice")`

Plots: `pairs()` `r .emo("practice")`

Plots: the boxplot `r .emo("practice")`

Plots: the conditioning plot `r .emo("practice")`

Plots: the conditioning plot `r .emo("practice")`