knitr::opts_chunk$set(echo = TRUE)
if (params$isonline){ s_asm_ex08_p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv" } else { s_asm_ex08_p01_data_path <- file.path(here::here(), "docs", "data", "asm_bw_flem.csv") } n_nr_rep <- 5 n_sd_prop_bw <- 0.5
Simulate a dataset with repeated measurements of Body Weight
and Breed
. The following dataset can be used as a basis:
cat(s_asm_ex08_p01_data_path, "\n")
The generated dataset should have the following properties
Body Weight
and its Breed
should be contained in the dataset.r n_nr_rep
repeated observations of Body Weight
and Breed
.Body Weight
within the repeated observations of one animal should be r 100 * n_sd_prop_bw
% of the total phenotypic variance of Body Weight
determined from the given basis dataset.s_asm_ex08_p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"
Read the data
tbl_ex08_p01 <- readr::read_csv(file = s_asm_ex08_p01_data_path) tbl_ex08_p01 <- dplyr::select(tbl_ex08_p01, Animal, `Body Weight`, Breed) head(tbl_ex08_p01)
set.seed(9875) sd_bw <- sd(tbl_ex08_p01$`Body Weight`) tbl_rep_obs_result <- NULL for (idx in 1:nrow(tbl_ex08_p01)){ tbl_rep_cur <- dplyr::bind_rows(tbl_ex08_p01[idx,], tibble::tibble(Animal = c(rep(tbl_ex08_p01$Animal[idx], n_nr_rep - 1)), `Body Weight` = rnorm((n_nr_rep-1), mean = tbl_ex08_p01$`Body Weight`[idx], sd = n_sd_prop_bw * sd_bw), Breed = c(rep(tbl_ex08_p01$Breed[idx], n_nr_rep-1)))) if (is.null(tbl_rep_obs_result)){ tbl_rep_obs_result <- tbl_rep_cur } else { tbl_rep_obs_result <- dplyr::bind_rows(tbl_rep_obs_result, tbl_rep_cur) } } head(tbl_rep_obs_result)
The generated dataset is written to a file, such that it will be available for Problem 2
s_ex08_p01_rep_obs_data_dir <- file.path(here::here(), "docs", "data") if (!dir.exists(s_ex08_p01_rep_obs_data_dir)) dir.create(path = s_ex08_p01_rep_obs_data_dir, recursive = TRUE) s_ex08_p01_rep_obs_data_path <- file.path(s_ex08_p01_rep_obs_data_dir, "asm_ex08_p01_rep_obs.csv") if (!file.exists(s_ex08_p01_rep_obs_data_path)) readr::write_csv(tbl_rep_obs_result, file = s_ex08_p01_rep_obs_data_path)
tbl_rep_obs_result$Animal <- as.factor(tbl_rep_obs_result$Animal) aov_ex08_p01 <- aov(`Body Weight` ~ Breed + Error(Animal), data = tbl_rep_obs_result) (smry_aov_ex08_p01 <- summary(aov_ex08_p01))
The results of the aov()
analysis gives estimates for the residual error variance $\sigma_e^2$ and for the variance $\sigma_a^2$ between animals. The estimate $\widehat{\sigma_e^2}$ is directly obtained from the section Error: Within
in the column Mean Sq
.
n_hat_sigmae2 <- smry_aov_ex08_p01$`Error: Within`[[1]]["Residuals","Mean Sq"]
The value for this estimate is $\widehat{\sigma_e^2} = r round(n_hat_sigmae2, digits = 1)
$. The estimate for $\sigma_a^2$ is obtained by the formula
$$E(MSQ_a) = n_a * \sigma_a^2 + \sigma_e^2$$
For the expected value of the $MSQ_a$, we insert the Mean Sq
-value of the Residuals
row in the section Error: Animal
. This leads to
$$\widehat{\sigma_a^2} = \frac{\widehat{E(MSQ_a)} - \widehat{\sigma_e^2}}{n_a}$$
The variable $n_a$ is the number of observations for one animal. The numeric value for $\widehat{\sigma_a^2}$ is computed as
n_est_msqa <- smry_aov_ex08_p01$`Error: Animal`[[1]]["Residuals","Mean Sq"] n_est_sigmaa2 <- (n_est_msqa - n_hat_sigmae2) / n_nr_rep
Hence, $\widehat{\sigma_a^2} = r round(n_est_sigmaa2, digit=1)
$
The variance of the residuals in the original basis dataset is obtained from
lm_bw_breed <- lm(`Body Weight` ~ Breed, data = tbl_ex08_p01) smry_bw_breed <- summary(lm_bw_breed)
The estimate of the residual variance is r round(smry_bw_breed$sigma^2, digits=1)
which is comparable to the value found by aov()
. The used value for the variance within observations is given by
n_obs_var_bw <- var(tbl_ex08_p01$`Body Weight`) n_obs_var_bw * n_sd_prop_bw^2
This variance is comparable to what was found by aov()
.
s_ex08_p02_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_ex08_p01_rep_obs.csv" if (!params$isonline) s_ex08_p02_data_path <- file.path(here::here(), "docs", "data", "asm_ex08_p01_rep_obs.csv")
Analyse the dataset generated in Problem 1 with a random effects model using the package lme4
. If you had difficulties to solve Problem 1, then you can also use the following dataset.
cat(s_ex08_p02_data_path, "\n")
# read the data tbl_ex08_p02 <- readr::read_csv(file = s_ex08_p02_data_path) # convert animal and breed to factors tbl_ex08_p02$Animal <- as.factor(tbl_ex08_p02$Animal) tbl_ex08_p02$Breed <- as.factor(tbl_ex08_p02$Breed) tbl_ex08_p02
lme4::lmer()
The mixed model analysis is done with
lmer_ex08_p02 <- lme4::lmer(`Body Weight` ~ Breed + (1|Animal), data = tbl_ex08_p02) summary(lmer_ex08_p02)
In the output of the summary()
function, the formula of the model that produced the above results is shown. The REML criterion tells us that the parameter estimation process has converged to the solutions shown. The statistics on the residuals is comparable to what we have already seen in the output of the summary of the lm()
function. The variance components were estimated with the REML method and are the same as the estimates found by aov()
. But this is only the case when the dataset is balanced, i.e., for each animal the same number of observations are contained in the dataset.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.