Home

/

GitHub

/

charlotte-ngs/asmss2022

/

In charlotte-ngs/asmss2022: Applied Statistical Methods in Animal Science - Spring Semester 2022

knitr::opts_chunk$set(echo = TRUE)

s_ex04p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"

Problem 1: Overfitting

Use the extended dataset on Body Weight of animals and fit all the variables and the factor breed. Compare the result with a regression that uses only Breast Circumference or with the linear model that only uses the factor Breed. The data set is available from: r s_ex04p01_data_path

Solution

Read the data

s_ex04p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"
tbl_ex04p01_data <- readr::read_csv(file = s_ex04p01_data_path)

Fit the full model

lm_ex04p01_full <- lm(formula = `Body Weight` ~ `Breast Circumference` + BCS + HEI + Breed, data = tbl_ex04p01_data)
summary(lm_ex04p01_full)

Fit the model with only Breast Circumference

lm_ex04p01_bwbc <- lm(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_ex04p01_data)
summary(lm_ex04p01_bwbc)

Fit only the model with the factor Breed

lm_ex04p01_bwbreed <- lm(formula = `Body Weight` ~ Breed, data = tbl_ex04p01_data)
summary(lm_ex04p01_bwbreed)

The comparison of the models shows that the full model does not produce a better model fit. The reason for this is that the explanatory variables in the full model are correlated among each other. As a result of this correlation structure, the same information is contained in different variables and as a result the single variables do not contribute a substantial amount to the explanation of the variation in the response variable.

The correlation structure amoung the different variables can be visualized via a so called pairs plot.

tbl_ex04p01_data$Breed <- as.factor(tbl_ex04p01_data$Breed)
pairs(formula = ~ `Breast Circumference` + BCS + HEI + Breed , data = tbl_ex04p01_data)

From this plot, we can clearly see that Breast Circumference and Breed are correlated. If we switch levels 2 and 3 of the breeds, then we can see the relationship between Breast Circumference and Breed even better.

Problem 2: Plotting

The first step before doing any analysis should always be to plot the data which helps to visualise the internal structure of a dataset. A very instructive plot is the so-called pairs-plot. This plot can be done using the function pairs(). The task of this problem is to create a pairs-plot for the extended dataset on Body Weight of animals. The input to the function pairs() must be all numeric. This means that the column containing the Breed in our dataset must be converted to a datatype called factor. This can be done using the function as.factor().

Results of linear models can also be plotted. In such plots, we are mainly interested in the behavior of the residuals. Hence, fit a linear regression model between Body Weight and Breast Circumference and plot the resulting linear model object.

Solution

Read the dataset

s_ex04p02_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"
tbl_ex04p02_data <- readr::read_csv(file = s_ex04p02_data_path)

Convert the breed column to a factor

tbl_ex04p02_data$Breed <- as.factor(tbl_ex04p02_data$Breed)

Create a pairs-plot

pairs(tbl_ex04p02_data)

The above matrix of scatterplots shows relationships between pairs of variables.

Fit the linear regression model

lm_ex04p02 <- lm(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_ex04p02_data)

Plot the result

plot(lm_ex04p02)

For the behavior of the resiudals, we are focusing on the first two plots. The first plot shows whether there is a dependence pattern between the residuals and the fitted values. For this plot a random pattern is desired. The second plot shows a QQ-plot of the residuals. This plot shows any deviation of the numeric distribution of the residuals from the normal distribution.

charlotte-ngs/asmss2022 documentation built on June 7, 2022, 1:33 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

charlotte-ngs/asmss2022
Applied Statistical Methods in Animal Science - Spring Semester 2022

In charlotte-ngs/asmss2022: Applied Statistical Methods in Animal Science - Spring Semester 2022

Problem 1: Overfitting

Solution

Problem 2: Plotting

Solution

R Package Documentation

Browse R Packages

We want your feedback!

charlotte-ngs/asmss2022 Applied Statistical Methods in Animal Science - Spring Semester 2022

In charlotte-ngs/asmss2022: Applied Statistical Methods in Animal Science - Spring Semester 2022

Problem 1: Overfitting

Solution

Problem 2: Plotting

Solution

R Package Documentation

Browse R Packages

We want your feedback!

charlotte-ngs/asmss2022
Applied Statistical Methods in Animal Science - Spring Semester 2022