## Do not delete this! ## It loads the s20x library for you. If you delete it ## your document may not compile library(s20x)
A real estate agent in Saratoga, New York, wishes to investigate how the sale price of houses is affected by the size of the house. In particular, what is the effect on price of an additional 20 m^2 in living area. We also want to compare prices for houses with living areas categorised as small and large and estimate what the expected house price for these two groups. She has compiled data from a random sample of 112 recent house sales in the city.
The dataset is stored in Houses.csv and includes variables:
Variable | Description ------------|------------------------------------------------------- price | sale price of house, in US dollars livingArea | size of the living area of the house, in square metres livingSpace | a factor classifying the size of the living area as either small if less than 170 square metres or large if greater.
Disclaimer: Before you rush off to Saratoga to buy a house, this is an old data set. I'm afraid house prices have gone up a lot since this data was collected.
Instructions:
We wish to investigate how the prices of houses in Saratoga, New York, are affected by the size of the house. In particular, what is the effect on price of an additional 20 $m^2$ in living area.
load(system.file("extdata", "houses.df.rda", package = "s20x"))
houses.df=read.csv("Houses.csv",header=T, stringsAsFactors = TRUE) plot(price~livingArea, houses.df, main="Price versus Living Area") plot(log(price)~livingArea, houses.df, main="log(Price) versus Living Area")
plot(price~livingArea, houses.df, main="Price versus Living Area") plot(log(price)~livingArea, houses.df, main="log(Price) versus Living Area")
There is an increasing relationship between house price and living area. The initial plot shows that scatter increases for higher values of living area, but log-transforming the response variable in the second plot results in a relationship that looks reasonably linear with constant scatter
houses.fit1 <- lm(price~livingArea, houses.df) modelcheck(houses.fit1) # Log the response variable and refit the model: houses.fit2 <- lm(log(price)~livingArea, houses.df) modelcheck(houses.fit2) summary(houses.fit2) confint(houses.fit2) # back transform exp(confint(houses.fit2)) # Extract second row of CI output only. exp(confint(houses.fit2)[2,]) # % change 100*(value-1) 100*(exp(confint(houses.fit2)[2,])-1) # scale by 20 and THEN back transform exp(confint(houses.fit2)[2,]*20) # % change 100*(value-1) 100*(exp(confint(houses.fit2)[2,]*20)-1)
conf1=as.data.frame(t(100*(exp(confint(houses.fit2)[2,]*20)-1))) resultStr1 = paste0(sprintf("%.1f", conf1$`2.5 %`), " and ", sprintf("%.1f", conf1$`97.5 %`))
plot(log(price)~livingArea, houses.df, main="log(Price) versus Living Area") abline(houses.fit2)
plot(price~livingArea, houses.df, main="log(Price) versus Living Area") lines(50:350,exp(houses.fit2$coef[1]+houses.fit2$coef[2]*50:350))
We have one numeric explanatory variable so have fitted a simple linear regression model to the data. However, we have clear evidence of increasing scatter as the living area increases so have logged the response variable price.
After logging price, the residuals looked much better. Normality looks good and no influential points were detected. We have a random sample, so the independence assumption is satisfied. Model assumptions are satisfied.
Our model is: $log(price_i) = \beta_0 + \beta_1 \times livingArea_i + \epsilon_{i}$, where $\epsilon_i \sim iid N(0,\sigma^2)$
Our model explained 53.5% of the variability in the logged data.
We investigated how the prices of houses in Saratoga, New York, are affected by house size.
We found strong evidence that the median house price increases as the size of the living area increases.Furthermore, this relationship increases exponentially, so the greater the size of the living area, the bigger the increase.
We estimate that the median house price increases by between r resultStr1[1] for every 20 $m^2$ increase in living area.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.