In david-borchers/sampling: Module MT4608 Sampling Theory Practical Package

knitr::opts_chunk$set(echo = TRUE)

# Specify whether answers are shown 
#answer <- TRUE
answer <- FALSE

In this practical we are going to use the \texttt{R} package \texttt{survey} to reproduce some of the results we have calculated manually in class and in tutorials, and then use it to estimate the number and proportion of people who voted for brexit, from a sample of 38 regions in the UK.

Getting started with the pacakge \texttt{sampling}

The \texttt{sampling} package should already be installed in RStudio Cloud, but if you are working on another machine, you need to install it from CRAN.

We start by loading the \texttt{survey} package and the caribou dataset of Table 2.1 of the notes. The dataset is in the package \texttt{mt4608}, written specifically for this module^[This package is not as robust as packages on CRAN, so please be a bit patient/tolerant - things may go a little wrong in situations that I did not anticipate when writing it.].

library(survey) # load survey package
library(mt4608) # load this module package

The caribou survey data are contained in the package \texttt{mt4608}, and can be accessed as follows:

data(caribou) # get the dataset
caribou # look at it
help(caribou) # get a description of it

Calculate the total area and total number of strips in the survey region:

A = unique(caribou$area[caribou$stratum==1]) + 
  unique(caribou$area[caribou$stratum==2])
N = unique(caribou$N[caribou$stratum==1]) + 
  unique(caribou$N[caribou$stratum==2])

Estimate the mean number of animals per survey strip, together with a 95\% confidence interval. To do this with the \texttt{survey} package, you first need to associate a survey design with the survey data, using the command \texttt{svydesign}. Look at the help for \texttt{svydesign} to understand more about the arguments; here \texttt{ids=$\sim$1} tells it that there are no clusters (you have to specify \texttt{ids}), and \texttt{fpc} if the ``finite population correction'' factor that we call $f$ in the notes.

n = dim(caribou)[1] # sample size
srs <- svydesign(ids=~1,data=caribou,fpc=rep(n/N,n))

Now using the survey data and the design, we can estimate the population mean and 95\% confidence interval very easily using the commands \texttt{svymean} and \texttt{confint}. To do this you have to tell \texttt{svymean} which column of the data frame is the $y$-variable (the response). In our case it is the \texttt{count}:

ybar <- svymean(~count,srs)
ybar
confint(ybar)

Questions

\begin{enumerate}

\item By doing appropriate calculations with the data, decide whether the function \texttt{confint} assumed that the sample mean has a normal distribution or a t-distribution.

\item Estimate the total number of caribou in the survey region, together with a 95\% confidence interval

\item The data frame \texttt{brexitsample} contains a sample of data from 38 out of 380 regions in the UK. You can load it after loading the \texttt{mt4608} package using this command \texttt{data("brexitsample")}. Consult the associated help file (\texttt{?brexitsample}) to see what the data frame contains, and then use these data to estimate the total number of people who voted to leave the EU, together with a 95\% confidence interval.

\item Estimate the sample size required to be able to estimate the total number of people who voted to leave the EU to within 1.5 million people, with 95\% confidence.

\item Estimate the sample size required to be able to estimate the proportion of the regions in the UK that had a majority of votes in favour of leaving the EU to within 5\%, with 95\% confidence.

\item Given that the total number of valid votes in the referendum (total of the variable \texttt{Valid_Votes} over all 380 regions) is 32,741,689, use the \texttt{sampling} package to estimate the total number of people who voted to leave the EU, together with a 95\% confidence interval, using the ratio estimator. (Hint: use function \texttt{svyratio()} and then function \texttt{predict()} on the output from this function.)

\item Calculate the sample size required to be able to estimate the total number of people who voted to leave the EU to within 1.5 million people, with 95\% confidence, using the regression estimator. Explain why this number is so different from that you got when using the sample mean as the estimator of this total.

\end{enumerate}

s2 = sum((count-ybar)^2)/(n-1)
f = n/N
var.ybar = (1-f)*s2/n
var.ybar
tcrit = abs(qt(0.025,n-1))
ci.t = ybar[1] + c(-1,1)*tcrit*sqrt(var.ybar)
ci.t
ci.z = ybar[1] + c(-1,1)*1.96*sqrt(var.ybar)
ci.z

# Assuming normality:
tot <- svytotal(~count,srs)
tot
round(confint(tot)) 

# Assuming t-distribution (better)
s2 = sum((count-ybar)^2)/(n-1)
f = n/N
var.tot = N^2*(1-f)*s2/n
var.tot
tcrit = abs(qt(0.025,n-1))
ci.t = tot[1] + c(-1,1)*tcrit*sqrt(var.tot) 
round(ci.t)

library(mt4608)
data("brexitsample")
N=380
n = dim(brexitsample)[1]
f = n/N
srs <- svydesign(ids=~1,data=brexitsample,fpc=rep(f,n))
leavetot <- svytotal(~Leave,srs)
leavetot/10^6
round(confint(leavetot))/10^6

S2 = n*attr(leavetot,"var")/(N^2*(1-f))
B0 = 1500000
newn = N/(1 + B0^2/(4*N*S2))
newn

# First make the indicator variable
brexitsample$y = 1*(brexitsample$Leave > brexitsample$Remain)
srs <- svydesign(ids=~1,data=brexitsample,fpc=rep(f,n))
leavep <- svymean(~y,srs)
leavep
confint(leavep)
# Calculate S^2 and then sample size required
S2 = n*attr(leavep,"var")/(N^2*(1-f))
B0 = 0.05
newn = N/(1 + B0^2/(4*N*S2))
newn

yxratio = svyratio(~Leave, ~Valid_Votes, srs)
predict(yxratio,total=32741689)

yxratio = svyratio(~Leave, ~Valid_Votes, srs)
predict(yxratio,total=32741689)

david-borchers/sampling documentation built on Sept. 17, 2022, 7:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

david-borchers/sampling
Module MT4608 Sampling Theory Practical Package

In david-borchers/sampling: Module MT4608 Sampling Theory Practical Package

Getting started with the pacakge \texttt{sampling}

Questions

R Package Documentation

Browse R Packages

We want your feedback!

david-borchers/sampling Module MT4608 Sampling Theory Practical Package

In david-borchers/sampling: Module MT4608 Sampling Theory Practical Package

Getting started with the pacakge \texttt{sampling}

Questions

R Package Documentation

Browse R Packages

We want your feedback!

david-borchers/sampling
Module MT4608 Sampling Theory Practical Package