knitr::opts_chunk$set(echo = TRUE)
s_ex04p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv"
Use the extended dataset on Body Weight
of animals and fit all the variables and the factor breed. Compare the result with a regression that uses only Breast Circumference
or with the linear model that only uses the factor Breed
. The data set is available from: r s_ex04p01_data_path
s_ex04p01_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv" tbl_ex04p01_data <- readr::read_csv(file = s_ex04p01_data_path)
lm_ex04p01_full <- lm(formula = `Body Weight` ~ `Breast Circumference` + BCS + HEI + Breed, data = tbl_ex04p01_data) summary(lm_ex04p01_full)
Breast Circumference
lm_ex04p01_bwbc <- lm(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_ex04p01_data) summary(lm_ex04p01_bwbc)
Breed
lm_ex04p01_bwbreed <- lm(formula = `Body Weight` ~ Breed, data = tbl_ex04p01_data) summary(lm_ex04p01_bwbreed)
The comparison of the models shows that the full model does not produce a better model fit. The reason for this is that the explanatory variables in the full model are correlated among each other. As a result of this correlation structure, the same information is contained in different variables and as a result the single variables do not contribute a substantial amount to the explanation of the variation in the response variable.
The correlation structure amoung the different variables can be visualized via a so called pairs
plot.
tbl_ex04p01_data$Breed <- as.factor(tbl_ex04p01_data$Breed) pairs(formula = ~ `Breast Circumference` + BCS + HEI + Breed , data = tbl_ex04p01_data)
From this plot, we can clearly see that Breast Circumference
and Breed
are correlated. If we switch levels 2 and 3 of the breeds, then we can see the relationship between Breast Circumference
and Breed
even better.
The first step before doing any analysis should always be to plot the data which helps to visualise the internal structure of a dataset. A very instructive plot is the so-called pairs
-plot. This plot can be done using the function pairs()
. The task of this problem is to create a pairs
-plot for the extended dataset on Body Weight
of animals. The input to the function pairs()
must be all numeric. This means that the column containing the Breed
in our dataset must be converted to a datatype called factor
. This can be done using the function as.factor()
.
Results of linear models can also be plotted. In such plots, we are mainly interested in the behavior of the residuals. Hence, fit a linear regression model between Body Weight
and Breast Circumference
and plot the resulting linear model object.
s_ex04p02_data_path <- "https://charlotte-ngs.github.io/asmss2022/data/asm_bw_flem.csv" tbl_ex04p02_data <- readr::read_csv(file = s_ex04p02_data_path)
tbl_ex04p02_data$Breed <- as.factor(tbl_ex04p02_data$Breed)
pairs
-plotpairs(tbl_ex04p02_data)
The above matrix of scatterplots shows relationships between pairs of variables.
lm_ex04p02 <- lm(formula = `Body Weight` ~ `Breast Circumference`, data = tbl_ex04p02_data)
plot(lm_ex04p02)
For the behavior of the resiudals, we are focusing on the first two plots. The first plot shows whether there is a dependence pattern between the residuals and the fitted values. For this plot a random pattern is desired. The second plot shows a QQ-plot of the residuals. This plot shows any deviation of the numeric distribution of the residuals from the normal distribution.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.