## Do not delete this! ## It loads the s20x library for you. If you delete it ## your document may not compile it. require(s20x) knitr::opts_chunk$set( dev = "png", fig.ext = "png", dpi = 96 )
We want to build a model to estimate the volume of a tree from the measurement of its height and diameter. This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees\footnote{From Ryan, T. A., Joiner, B. L. and Ryan, B. F. (1976) The Minitab Student Handbook. Duxbury Press.}.
The data consist of 31 observations on the following 3 variables:
diameter: Tree diameter in inches (measured at 4 ft 6 in above the ground).height: Height in feet.volume: Volume of timber in cubic feet.The objective is to be able to estimate the volume of the tree from the measurement of its height and diameter.
Using basic geometry, it can argued that volume will have a power law relationship with diameter and height. Specifically, it may be that the trunk of a tree can be reasonably approximated by a cylinder/or cone.
For a cylinder the volume is $V=\pi h r^{2}$ where $r$ is radius and $h$ is height.
For a cone it is $V=\frac{\pi}{3} h r^{2}$.
In general, this correponds to the power law $V\propto h d^{2}$, where $d$ is diameter.
That is, $\log(V)=\beta_0+\log(h)+2*\log(d)$.
This can be written $\log(V)=\beta_0+\beta_1*\log(h)+\beta_2\log(d)$ where $\beta_1=1$ and $\beta_2=2$.
Hence, to explore whether the above geometric argument is valid, we also need to test the hypotheses $H_0:\beta_1=1$, $H_0:\beta_2=2$.
load(system.file("extdata", "Cherry.df.rda", package = "s20x"))
Cherry.df = read.table("Cherry.txt", header = T) pairs20x(Cherry.df[, c("volume", "height", "diameter")])
pairs20x(Cherry.df[, c("volume", "height", "diameter")])
Scatter in volume increases with size. Relationship between volume and height looks close to linear, and with diameter looks to be increasing in slope with size.
# Let's create some new variables in Cherry.df Cherry.df$logvolume = log(Cherry.df$volume) Cherry.df$logheight = log(Cherry.df$height) Cherry.df$logdiameter = log(Cherry.df$diameter) # Check out the new names dimnames(Cherry.df)[[2]] pairs20x(Cherry.df[, c("logvolume", "logheight", "logdiameter")])
Scatter in logvolume looks constant, and relationship with logheight and logdiameter looks linear.
Cherry.fit = lm(logvolume ~ logheight + logdiameter, data = Cherry.df) plot(Cherry.fit, which = 1) normcheck(Cherry.fit) cooks20x(Cherry.fit) summary(Cherry.fit) confint(Cherry.fit)
conf1 = as.data.frame(confint(Cherry.fit)) resultStr1 = paste0(sprintf("%.2f%%", conf1$`2.5 %`), " and ", sprintf("%.2f%%", conf1$`97.5 %`))
# Obtain the estimates of interest (estimates = summary(Cherry.fit)$coeff[2:3, 1]) # and their associated std errors (std.errors = summary(Cherry.fit)$coeff[2:3, 2]) # t-values for the two hypotheses beta1=1 and beta2=2 (tstats = (estimates - c(1, 2))/std.errors) # p-values (pvals = 2 * (1 - pt(abs(tstats), 28)))
After logging the variables, the scatter plots showed the desired linear relationship between volume and the height and diameter variables, and constant scatter. So, we fitted a power law model explaining log volume with both log height and log diameter.
The underlying model assumptions appear valid with no unduly influential points.
Our final model is $$log(volume_i)=\beta_0 +\beta_1\times log(height_i)+ \beta_2*log(diameter_i)+ \epsilon_i,$$ where $\epsilon_i \sim iid ~ N(0,\sigma^2)$.
Our power law model explained 97.8% of the total variability in (log) volume.
The objective is to be able to estimate the volume of the tree from the measurement of its height and diameter. The postulated model was that cherry tree volume would follow a power law relationship with height and diameter. In particular, that volume would increase linearly with height, and with the square of diameter.
Cherry tree volume was seen to follow a power law model with respect to both height and weight, and there was no evidence that these powers differ from the hypothesized values of 1 (P-value = 0.57) and 2 (P-value = 0.82), respectively. It would appear that our geometrical arguments were valid.
We estimate that a 1% increase in height corresponds to an increase in median volume of between r resultStr1[2], and a 1% increase in diameter corresponds to an increase in median volume of between r resultStr1[3].
If you believe (and we have no reason not to) that the underlying relationship is indeed that $V \propto h r^2$ then the only term to be estimated is $\beta_0$. This is how you'd estimate it:
# Intercept only model. fit.beta0 = lm(I(logvolume-logheight-2*logdiameter)~1, data = Cherry.df) # beta0 changes from -6.63 to -6.17 summary(fit.beta0)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.