knitr::opts_chunk$set(echo = FALSE, results = 'asis', fig.pos = 'H') knitr::knit_hooks$set(hook_convert_odg = rmdhelp::hook_convert_odg)
cnt <- rmdhelp::R6ClassCount$new() cnt$set_prefix(ps_prefix = "## Problem")
cat(cnt$out(ps_suffix = "Variance Components Estimation"), "\n")
The simplest form of variance components estimation is based on the residuals of a fitted linear model and is shown in the summary results of the R-function lm()
. Let us assume that we are given the dataset in the table shown below to which we fit a simple sire model.
tbl_data_sol12p02 <- tibble::tibble( Animal = c(4, 5, 6, 7, 8), Sire = c(2, 1, 1, 2, 1), WWG = c(4.5,2.9,3.9,3.5,5.0) ) knitr::kable(tbl_data_sol12p02, booktabs = TRUE, longtable = TRUE, caption = "Example Dataset for Variance Components Estimation Based on Residuals Using a Sire Model")
The sire model is simplified to have a common mean $\mu$. For a moment we are setting the sire effects to be fixed effects. This leads to the following model with $var(e) = I * \sigma_e^2$
$$y = X\mu + Z_ss + e$$
Using the above shown dataset we can use the R-function lm()
to fit this simple linear model. Because, we want to have the sires as fixed effects, we have to convert them into factors before calling lm()
.
tbl_data_sol12p02$Sire <- as.factor(tbl_data_sol12p02$Sire) lm_data_sol12p02 <- lm( WWG ~ 1 + Sire, data = tbl_data_sol12p02 ) summary(lm_data_sol12p02)
res_std_sol12p02 <- sigma(lm_data_sol12p02)
From the output of summary()
we are given the residual standard error to be r round(res_std_sol12p02, digits = 4)
. This residual standard error is an estimate of $\sigma_e$. The question is where does it come from. The least-squares procedure does not yield this estimate for $\sigma_e$. The answer is that this estimate comes from the residuals $r$ of the model. For our model the vector $r$ of residuals is defined as
$$r = y - X\widehat{\mu} - Z_s\widehat{s}$$
where $\widehat{\mu}$ and $\widehat{s}$ can be taken from the ouput of the summary()
function. They correspond to
vec_coeff <- coefficients(lm_data_sol12p02) n_mu_hat <- vec_coeff[[1]] vec_sire_hat <- c(0, vec_coeff[[2]]) cat("$$\\widehat{\\mu} = ", n_mu_hat, "$$\n") cat("$$\n") cat(paste(rmdhelp::bcolumn_vector(pvec = vec_sire_hat, ps_name = '\\widehat{s}'), collapse = '\n'), '\n') cat("$$\n")
The estimate $\widehat{\sigma_e^2}$ for $\sigma_e^2$ is obtained by
$$\widehat{\sigma_e^2} = \frac{1}{n-p} \sum_{i=1}^n r_i^2$$
where $n$ is the total number of observations and $p$ is the number of parameters that are estimated by lm
which is r length(vec_coeff)
for our sire model. The term $n-p$ is also called degrees of freedom (df). What is given as residual standard error by the output of summary()
is the square root of $\widehat{\sigma_e^2}$.
Verify for the above given dataset and the proposed sire model the residual standard error given by summary()
by using the computation based on the residuals shown above.
The vector $r$ of residuals can be obtained using the function residuals()
(vec_res <- residuals(lm_data_sol12p02))
The degrees of freedom for the residuals ($n-p$) are obtained by the function df.residual()
(n_df_e <- df.residual(lm_data_sol12p02))
From this the residual standard error is computed as
(n_res_sd <- sqrt(sum(vec_res^2) / n_df_e))
The same result can be obtained using the function sigma()
sigma(lm_data_sol12p02)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.