## Do not delete this! ## It loads the s20x library for you. If you delete it ## your document may not compile it. require(s20x) knitr::opts_chunk$set( dev = "png", fig.ext = "png", dpi = 96 )
The data in this question come from an experiment to see whether cloud-seeding works. Cloud-seeding is the process of firing silver nitrate (AgNO$_3$) into the clouds. Water coalesces around the silver nitrate particles, hopefully getting large enough to precipitate (rain). The measurements in this study are in acre-feet. One acre-foot is the amount of water it takes to fill one acre uniformly to the depth of one foot. One acre-foot is about 1.2 million litres of water (1233481.85532 litres to be exact). This data set is included in the s20x library.
The variables of interest are:
rain: total rainfall.seed: is a factor with two levels, seeded and unseeded.We wish to see whether cloud seeding produces more rain. Also, what is the typical rainfall from seeded and unseeded clouds?
data(rain.df) boxplot(rain ~ seed, data = rain.df, horizontal = TRUE)
Looks like our old right-skewed friend again. Maybe, just maybe, a log-transformation would work. Let's try it.
boxplot(log(rain) ~ seed, data = rain.df, horizontal = TRUE)
Yup. Looks much better, and we have preliminary evidence that the seeding seems to work. Should we be worried about equality of variance?
summaryStats(log(rain) ~ seed, data = rain.df)
No. On the log-scale, the standard deviations are nearly identical, and the midspreads are well below a factor of two different. Let's go ahead and fit the model.
rain.fit = lm(log(rain) ~ seed, data = rain.df) plot(rain.fit, which = 1) normcheck(rain.fit) cooks20x(rain.fit) summary(rain.fit) # Let's refit the model so that `unseeded` is the base level. rain.df$seed = relevel(rain.df$seed, ref = "unseeded") rain.fit.2 = lm(log(rain) ~ seed, data = rain.df) summary(rain.fit.2) exp(confint(rain.fit.2)) 100 * (exp(confint(rain.fit.2)) - 1)
pred.df = data.frame(seed = c("unseeded", "seeded")) exp(predict(rain.fit.2, pred.df, interval = "confidence"))
conf1=as.data.frame(exp(predict(rain.fit.2, pred.df, interval = "confidence"))) resultStr1 = paste0(sprintf("%.1f", conf1$lwr), " to ", sprintf("%.1f", conf1$upr)) conf2 = as.data.frame(100 * (exp(confint(rain.fit.2)) - 1)) resultStr2 = sprintf("%s%% to %s%%", format(round(conf2$`2.5 %`,0), big.mark = ",", trim = TRUE), format(round(conf2$`97.5 %`,-1), big.mark = ",", trim = TRUE) )
The boxplots of log(rain) showed that the groups were comparable. So, we fitted a linear model to explain log(rain) with the explanatory factor seed.
All model assumptions were satisified.
Our final model is $$log(Rain_i) = \beta_0 + \beta_1 \times Seed.Seeded_i + \epsilon_i,$$ where $\epsilon_i \sim iid ~ N(0,\sigma^2)$. Here $Seed.Seeded_i=1$ if the observation was done in a seeded environment, otherwise it is zero.
Our model describes 13% of the variability in the logged measurements of total rainfall.
We wish to see whether cloud seeding produces more rain.
There is strong evidence cloud-seeding works (P-value < 0.01).
We estimate that the median rain from the seeded clouds is about r resultStr2[2] higher than that from unseeded clouds.
The unseeded clouds produce median rainfall of r resultStr1[1] acre-feet, whereas for the seeded clouds it is r resultStr1[2] acre-feet of rain.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.