knitr::opts_chunk$set(echo = FALSE, results = 'asis', fig.pos = 'H')
knitr::knit_hooks$set(hook_convert_odg = rmdhelp::hook_convert_odg)
cnt <- rmdhelp::R6ClassCount$new()
cnt$set_prefix(ps_prefix = "## Problem")
cat(cnt$out(ps_suffix = "Variance Components Estimation"), "\n")

The simplest form of variance components estimation is based on the residuals of a fitted linear model and is shown in the summary results of the R-function lm(). Let us assume that we are given the dataset in the table shown below to which we fit a simple sire model.

tbl_data_sol12p02 <- tibble::tibble( Animal = c(4, 5, 6, 7, 8),
                                         Sire   = c(2, 1, 1, 2, 1),
                                         WWG    = c(4.5,2.9,3.9,3.5,5.0) )

knitr::kable(tbl_data_sol12p02,
             booktabs  = TRUE,
             longtable = TRUE,
             caption   = "Example Dataset for Variance Components Estimation Based on Residuals Using a Sire Model")

The sire model is simplified to have a common mean $\mu$. For a moment we are setting the sire effects to be fixed effects. This leads to the following model with $var(e) = I * \sigma_e^2$

$$y = X\mu + Z_ss + e$$

Using the above shown dataset we can use the R-function lm() to fit this simple linear model. Because, we want to have the sires as fixed effects, we have to convert them into factors before calling lm().

tbl_data_sol12p02$Sire <- as.factor(tbl_data_sol12p02$Sire)
lm_data_sol12p02 <- lm( WWG ~ 1 + Sire, data = tbl_data_sol12p02 )
summary(lm_data_sol12p02)
res_std_sol12p02 <- sigma(lm_data_sol12p02)

From the output of summary() we are given the residual standard error to be r round(res_std_sol12p02, digits = 4). This residual standard error is an estimate of $\sigma_e$. The question is where does it come from. The least-squares procedure does not yield this estimate for $\sigma_e$. The answer is that this estimate comes from the residuals $r$ of the model. For our model the vector $r$ of residuals is defined as

$$r = y - X\widehat{\mu} - Z_s\widehat{s}$$

where $\widehat{\mu}$ and $\widehat{s}$ can be taken from the ouput of the summary() function. They correspond to

vec_coeff <- coefficients(lm_data_sol12p02)
n_mu_hat <- vec_coeff[[1]]
vec_sire_hat <- c(0, vec_coeff[[2]])

cat("$$\\widehat{\\mu} = ", n_mu_hat, "$$\n")
cat("$$\n")
cat(paste(rmdhelp::bcolumn_vector(pvec = vec_sire_hat, ps_name = '\\widehat{s}'), collapse = '\n'), '\n')
cat("$$\n")

The estimate $\widehat{\sigma_e^2}$ for $\sigma_e^2$ is obtained by

$$\widehat{\sigma_e^2} = \frac{1}{n-p} \sum_{i=1}^n r_i^2$$

where $n$ is the total number of observations and $p$ is the number of parameters that are estimated by lm which is r length(vec_coeff) for our sire model. The term $n-p$ is also called degrees of freedom (df). What is given as residual standard error by the output of summary() is the square root of $\widehat{\sigma_e^2}$.

Your Task

Verify for the above given dataset and the proposed sire model the residual standard error given by summary() by using the computation based on the residuals shown above.

Solution

The vector $r$ of residuals can be obtained using the function residuals()

(vec_res <- residuals(lm_data_sol12p02))

The degrees of freedom for the residuals ($n-p$) are obtained by the function df.residual()

(n_df_e <- df.residual(lm_data_sol12p02))

From this the residual standard error is computed as

(n_res_sd <- sqrt(sum(vec_res^2) / n_df_e))

The same result can be obtained using the function sigma()

sigma(lm_data_sol12p02)


charlotte-ngs/lbgfs2020 documentation built on Dec. 20, 2020, 5:39 p.m.