This example comes from Wood (2016), full reference is the syllabus.
"This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees. Note that the diameter (in inches) is erroneously labelled Girth in the data. It is measured at 4 ft 6 in above the ground." -R documentation
library(ggplot2) library(dplyr) data('trees', package='datasets')
head(trees)
trees %>% rename(Diameter = "Girth") %>% ggplot(aes(Diameter, Volume, color=Height)) + geom_point() + theme_minimal() + ggtitle("Diameter and volume for cherry trees, coloured by height") trees %>% rename(Diameter = "Girth") %>% ggplot(aes(Height, Volume, color=Diameter)) + geom_point() + theme_minimal() + ggtitle("Height and volume for cherry trees, coloured by diameter")
We've met generalized linear models with Poisson and Logistic responses. Another fairly popular distibution for GLMs is Gamma. It's appropriate for when your response is > 0 and can handle right skew well. Log is not the canonical link, but is a popular choice.
reps <- 50000 nexps <- 5 rate <- 0.1 set.seed(0) x1 <- replicate(reps, sum(rexp(n=nexps, rate=rate))) colors <- c("shape = 5, rate = 0.1" = "blue", "shape = 2, rate = 0.1" = "red", "shape = 5, rate = 1" = "orange", "shape = 20, rate = 1" = "green") ggplot(data.frame(x1), aes(x1)) + scale_color_manual(values = colors) + stat_function(fun=function(x)dgamma(x, shape=5, scale=1/0.1), aes(color="shape = 5, rate = 0.1"), size=1) + stat_function(fun=function(x)dgamma(x, shape=2, scale=1/0.1), aes(color="shape = 2, rate = 0.1"), size=1) + stat_function(fun=function(x)dgamma(x, shape=5, scale=1), aes(color="shape = 5, rate = 1"), size=1) + stat_function(fun=function(x)dgamma(x, shape=20, scale=1), aes(color="shape = 20, rate = 1"), size=1) + theme_minimal() + labs(x = "x", y = "y", color = "Parameters", title = "Example Gamma pdf")
library(MASS) library(mgcv) library(gratia) # ggplot style data(trees) ct1<-gam(Volume~s(Height)+s(Girth), family=Gamma(link="log"),data=trees, select=TRUE, method="REML") summary(ct1)
This fits the model:
$$\text{log}(\mathbb{E}[\text{Volume}_i]) = f_1(\text{Height}_i) + f_2(\text{Girth}_i)$$
$$\text{Volume}i \sim \text{Gamma}(\alpha, \beta)$$
and coef(ct1)
shows us the $\beta{jk}$. Note: These $\beta$ aren't interpretable in context.
coef(ct1)
gam.check(ct1) gratia::appraise(ct1)
par(mfrow=c(1,2)) vis.gam(ct1, view=c("Girth", "Height")) vis.gam(ct1, view=c("Height", "Girth"))
res1 <-gam(Volume~s(Girth, k=3), family=Gamma(link="log"), data=trees) res2 <-gam(Volume~s(Girth,k=15), family=Gamma(link="log"), data=trees) res3 <-gam(Volume~s(Girth, k=25), family=Gamma(link="log"), data=trees) par(mfrow=c(1, 3)) plot(res1) plot(res2) plot(res3)
$\hat f$ is smooth, don't need many basis functions
If you want to perform LR tests, you should probably use ML as the smoothing selection method and not use select=TRUE
as the approximation used can be very bad for smooths with penalties on their null spaces. This treats our smooths as random effects.
ct1_ml <- gam(Volume~s(Height) + s(Girth), family=Gamma(link="log"), data=trees, method="ML") ct2_ml <- gam(Volume ~ Height + s(Girth), family=Gamma(link="log"), data = trees, method = "ML") ct3_ml <- gam(Volume ~ s(Girth), family=Gamma(link="log"), data = trees, method = "ML") lmtest::lrtest(ct1_ml, ct2_ml) lmtest::lrtest(ct2_ml, ct3_ml)
We can use the draw()
function to make comments about these variables.
draw(ct1_ml)
predict.gam
allows us to make predictions from our fitted models in the way we're used to from predict
.
trees$pred <- predict(ct1, type="response") trees %>% ggplot(aes(Volume, pred)) + geom_point() + geom_abline(intercept = 0, slope = 1) + theme_minimal() + labs(x = "Observed volume", y = "Predicted volume")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.