# Do not delete this! # It loads the s20x library for you. If you delete it # your document may not compile it. require(s20x) knitr::opts_chunk$set( dev = "png", fig.ext = "png", dpi = 96 )
The ages and prices of 123 Mazda cars were collected from the Melbourne Age newspaper in 1991. We want to learn about Mazda prices, and how they decrease with age.
The variables measured are:
price: Price in Australian $.year: Year of manufacture (note that 1990 = 90).We want to see how Mazda car prices decrease with age.
data("mazda.df") Mazda.df <- mazda.df
Mazda.df = read.table("mazda.txt", header = T) head(Mazda.df) # We need to creates a new variable called age ourselves Mazda.df$age = 91 - Mazda.df$year head(Mazda.df) # Plot these data trendscatter(price ~ age, data = Mazda.df)
head(Mazda.df) # We need to creates a new variable called age ourselves Mazda.df$age = 91 - Mazda.df$year head(Mazda.df) # Plot these data trendscatter(price ~ age, data = Mazda.df)
The scatter plot shows a decreasing non-linear relationship. As the age increases, the price decreases - but the rate of decrease is rapid at first, then declines, so also decrease. This suggests an exponentially decreasing relationship.
We also see that the scatter around the trend is not constant: it is higher when the price is higher and lower when the price is lower, so higher centre is associated with higher spread.
Let's fit a naive simple linear model using age for now.\footnote{In practice, one could omit this step since our assumptions are obviously not valid.}
PriceAge.fit = lm(price ~ age, data = Mazda.df) plot(PriceAge.fit, which = 1) trendscatter(log(price) ~ age, data = Mazda.df) PriceAge.fit2 = lm(log(price) ~ age, data = Mazda.df) plot(PriceAge.fit2, which = 1) normcheck(PriceAge.fit2) cooks20x(PriceAge.fit2) summary(PriceAge.fit2) # Backtransform exp(confint(PriceAge.fit2)) # Backtransform to % difference 100 * (exp(confint(PriceAge.fit2)) - 1)
conf1 = as.data.frame(exp(confint(PriceAge.fit2))) resultStr1 = sprintf("A$%s to A$%s", format(round(conf1$`2.5 %`,-2), big.mark = ",", trim = TRUE), format(round(conf1$`97.5 %`,-2), big.mark = ",", trim = TRUE) ) conf2 = as.data.frame(abs(100 * (exp(confint(PriceAge.fit2)) - 1))) resultStr2 = paste0(sprintf("%.1f%%", conf2$`97.5 %`), " to ", sprintf("%.1f%%", conf2$`2.5 %`))
The scatter plot of age vs price showed clear nonlinearity and an increase in variability with price.
Residuals from a simple linear model showed failed the equality of variance and no-trend assumptions, and so the prices were log transformed. A simple linear model fitted to logged price satisfied all assumptions.
Our final model is $$log(Price_i)=\beta_0 +\beta_1\times Age_i+\epsilon_i,$$ where $\epsilon_i \sim iid ~ N(0,\sigma^2)$.
Our model explained 82% of the variability in the logged Mazda prices.
We wanted to see how Mazda car prices decrease with age.
There was clear evidence the price of the cars was exponentially decreasing as the cars got older (P-value $\approx$ 0).
We estimate that the median price for new Mazda cars (in 1991) was between r resultStr1[1] (to the nearest A$100).
We estimate that each additional year in age results in depreciation of between r resultStr2[2].
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.