knitr::opts_chunk$set(echo = TRUE) # Specify whether answers are shown #answer <- TRUE answer <- FALSE
In this practical we are going to use the \texttt{R} package \texttt{survey} to reproduce some of the results we have calculated manually in class and in tutorials, and then use it to estimate the number and proportion of people who voted for brexit, from a sample of 38 regions in the UK.
The \texttt{sampling} package should already be installed in RStudio Cloud, but if you are working on another machine, you need to install it from CRAN.
We start by loading the \texttt{survey} package and the caribou dataset of Table 2.1 of the notes. The dataset is in the package \texttt{mt4608}, written specifically for this module^[This package is not as robust as packages on CRAN, so please be a bit patient/tolerant - things may go a little wrong in situations that I did not anticipate when writing it.].
library(survey) # load survey package library(mt4608) # load this module package
The caribou survey data are contained in the package \texttt{mt4608}, and can be accessed as follows:
data(caribou) # get the dataset caribou # look at it help(caribou) # get a description of it
Calculate the total area and total number of strips in the survey region:
A = unique(caribou$area[caribou$stratum==1]) + unique(caribou$area[caribou$stratum==2]) N = unique(caribou$N[caribou$stratum==1]) + unique(caribou$N[caribou$stratum==2])
Estimate the mean number of animals per survey strip, together with a 95\% confidence interval. To do this with the \texttt{survey} package, you first need to associate a survey design with the survey data, using the command \texttt{svydesign}. Look at the help for \texttt{svydesign} to understand more about the arguments; here \texttt{ids=$\sim$1} tells it that there are no clusters (you have to specify \texttt{ids}), and \texttt{fpc} if the ``finite population correction'' factor that we call $f$ in the notes.
n = dim(caribou)[1] # sample size srs <- svydesign(ids=~1,data=caribou,fpc=rep(n/N,n))
Now using the survey data and the design, we can estimate the population mean and 95\% confidence interval very easily using the commands \texttt{svymean} and \texttt{confint}. To do this you have to tell \texttt{svymean} which column of the data frame is the $y$-variable (the response). In our case it is the \texttt{count}:
ybar <- svymean(~count,srs) ybar confint(ybar)
\begin{enumerate}
\item By doing appropriate calculations with the data, decide whether the function \texttt{confint} assumed that the sample mean has a normal distribution or a t-distribution.
\item Estimate the total number of caribou in the survey region, together with a 95\% confidence interval
\item The data frame \texttt{brexitsample} contains a sample of data from 38 out of 380 regions in the UK. You can load it after loading the \texttt{mt4608} package using this command \texttt{data("brexitsample")}. Consult the associated help file (\texttt{?brexitsample}) to see what the data frame contains, and then use these data to estimate the total number of people who voted to leave the EU, together with a 95\% confidence interval.
\item Estimate the sample size required to be able to estimate the total number of people who voted to leave the EU to within 1.5 million people, with 95\% confidence.
\item Estimate the sample size required to be able to estimate the proportion of the regions in the UK that had a majority of votes in favour of leaving the EU to within 5\%, with 95\% confidence.
\item Given that the total number of valid votes in the referendum (total of the variable \texttt{Valid_Votes} over all 380 regions) is 32,741,689, use the \texttt{sampling} package to estimate the total number of people who voted to leave the EU, together with a 95\% confidence interval, using the ratio estimator. (Hint: use function \texttt{svyratio()} and then function \texttt{predict()} on the output from this function.)
\item Calculate the sample size required to be able to estimate the total number of people who voted to leave the EU to within 1.5 million people, with 95\% confidence, using the regression estimator. Explain why this number is so different from that you got when using the sample mean as the estimator of this total.
\end{enumerate}
s2 = sum((count-ybar)^2)/(n-1) f = n/N var.ybar = (1-f)*s2/n var.ybar tcrit = abs(qt(0.025,n-1)) ci.t = ybar[1] + c(-1,1)*tcrit*sqrt(var.ybar) ci.t ci.z = ybar[1] + c(-1,1)*1.96*sqrt(var.ybar) ci.z
# Assuming normality: tot <- svytotal(~count,srs) tot round(confint(tot)) # Assuming t-distribution (better) s2 = sum((count-ybar)^2)/(n-1) f = n/N var.tot = N^2*(1-f)*s2/n var.tot tcrit = abs(qt(0.025,n-1)) ci.t = tot[1] + c(-1,1)*tcrit*sqrt(var.tot) round(ci.t)
library(mt4608) data("brexitsample") N=380 n = dim(brexitsample)[1] f = n/N srs <- svydesign(ids=~1,data=brexitsample,fpc=rep(f,n)) leavetot <- svytotal(~Leave,srs) leavetot/10^6 round(confint(leavetot))/10^6
S2 = n*attr(leavetot,"var")/(N^2*(1-f)) B0 = 1500000 newn = N/(1 + B0^2/(4*N*S2)) newn
# First make the indicator variable brexitsample$y = 1*(brexitsample$Leave > brexitsample$Remain) srs <- svydesign(ids=~1,data=brexitsample,fpc=rep(f,n)) leavep <- svymean(~y,srs) leavep confint(leavep) # Calculate S^2 and then sample size required S2 = n*attr(leavep,"var")/(N^2*(1-f)) B0 = 0.05 newn = N/(1 + B0^2/(4*N*S2)) newn
yxratio = svyratio(~Leave, ~Valid_Votes, srs) predict(yxratio,total=32741689)
yxratio = svyratio(~Leave, ~Valid_Votes, srs) predict(yxratio,total=32741689)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.