knitr::opts_chunk$set(echo = TRUE, warning=FALSE) suppressWarnings(suppressMessages(suppressPackageStartupMessages(library(ggplot2))))
This example is adapted from a YouTube video by OpenIntro and plotting help from dkmathstats.com.
Our example problem considers a population of brushtail possums.

We are told that this population of brushtail possums have a normal distribution of lengths with a population mean ($\mu$) of $92.6$ mm and a population standard deviation ($\sigma$) of $3.6$ mm.
To aid in plotting this distribution, we can define function in R to help us in plotting. First we will plot the distribution using length values. Note how we set the mean and standard deviation in the function.
possumDistn <- function(x){ dnorm(x, mean = 92.6, sd = 3.6) }
Let's start with a basic plot.
lims <- c(80, 105) xvalues <- data.frame(x=lims) plt <- ggplot(xvalues, aes(x=x)) + stat_function(fun = possumDistn) + xlim(lims) + xlab("possum length [mm]") + ylab("density") print(plt)
Because this is a normal distribution, it is convenient to convert value into Z-scores where we transform the data values ($X$) using the equation:
$$ Z = \frac{x - \mu}{\sigma}$$
where $\mu$ is our estimate of the mean value for the population and $\sigma$ is our estimate of the standard deviation for the population. This is the "standard form" for the normal distribution.
Our plot of the standard form looks like this:
zLim = 3 lims <- c(-zLim, zLim) xvalues <- data.frame(x=lims) zPlt <- ggplot(xvalues, aes(x)) + stat_function(fun = dnorm) + scale_x_continuous(breaks = seq(from = -zLim, to = zLim, by = 1), limits = c(-zLim, zLim)) + xlab("Z") + ylab("density") print(zPlt)
We can build upon this plot as we ask questions about this distribution.
Consider the following question:
What percentage of possums have a head length smaller than 98 mm?
First compute the Z value and then the required probability. Let's
also make the output look nice by using the pander package to
output a table:
library(pander) zVal1 <- (98 - 92.6)/3.6 prob1 <- pnorm(zVal1) ans <- c(zVal1, prob1) names(ans) <- c("Z", "Probability") pander(ans)
To draw this on our graph, we need to make a shading function and add
it to our plot. Note: Be careful here. The function only takes
x values. The limit, zVal1 is hard-coded. Make sure you don't redefine
this variable!
zVal1 <- (98 - 92.6)/3.6 shadeLower1 <- function(x){ val <- dnorm(x) val[x >= zVal1] <- NA return(val) } limV = 3 xvalues <- data.frame(x = c(-limV, limV)) plt1 <- ggplot(xvalues, aes(x)) + stat_function(fun = dnorm) + stat_function(fun = shadeLower1, geom = "area", fill = "blue", alpha = 0.3 ) + scale_x_continuous(breaks = seq(from = -limV, to = limV, by = 1), limits = c(-limV, limV)) + xlab("Z") + ylab("density") + ggtitle("Possums with length smaller than 98 mm") + theme(axis.text=element_text(size=12), axis.title=element_text(size=12), plot.title=element_text(hjust = 0.5)) # center the title print(plt1)
We can also ask another question:
What is the fraction of possums with a head length less than 89 mm?
Again we compute the Z-score.
zVal2 <- (89 - 92.6)/3.6 prob2 <- pnorm(zVal2) ans <- c(zVal2, prob2) names(ans) <- c("Z", "Probability") pander(ans)
To plot this we need to make a new shading function that will
use the new value defined above. Note that we have to be careful
to not re-define the variable zVal2 and break the function.
zVal2 <- (89 - 92.6)/3.6 shadeLower <- function(x){ val <- dnorm(x) val[x >= zVal2] <- NA return(val) } limV = 3 xvalues <- data.frame(x = c(-limV, limV)) plt1 <- ggplot(xvalues, aes(x)) + stat_function(fun = dnorm) + stat_function(fun = shadeLower, geom = "area", fill = "blue", alpha = 0.3 ) + scale_x_continuous(breaks = seq(from = -limV, to = limV, by = 1), limits = c(-limV, limV)) + xlab("Z") + ylab("density") + ggtitle("Possums with length smaller than 89 mm") + theme(axis.text=element_text(size=12), axis.title=element_text(size=12), plot.title=element_text(hjust = 0.5)) # center the title print(plt1)
Finally, we can ask about the fraction of possums with length between 89 and 98 mm. We can compute this by difference.
prob <- prob1-prob2 names(prob) <- "prob" pander(prob)
We can also plot this. Note that we use both zVal1 and zVal2 in this
function.
zVal1 <- (98 - 92.6)/3.6 zVal2 <- (89 - 92.6)/3.6 shadeBetween <- function(x){ val <- dnorm(x) val[x >= zVal1] <- NA val[x <= zVal2] <- NA return(val) } limV = 3 xvalues <- data.frame(x = c(-limV, limV)) plt3 <- ggplot(xvalues, aes(x)) + stat_function(fun = dnorm) + stat_function(fun = shadeBetween, geom = "area", fill = "blue", alpha = 0.3 ) + scale_x_continuous(breaks = seq(from = -limV, to = limV, by = 1), limits = c(-limV, limV)) + xlab("Z") + ylab("density") + ggtitle("Possums with length between 89 and 98 mm") + theme(axis.text=element_text(size=12), axis.title=element_text(size=12), plot.title=element_text(hjust = 0.5)) # center the title print(plt3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.