knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = TRUE )
To install the package use devtools (devtools is available on CRAN).
# devtools::install_github("tereom/quickcountmx") library(quickcountmx) library(tidyverse)
The package includes the results of the 2012 Guanajuato Governor election, which will be used to exemplify the functions.
data("gto_2012") glimpse(gto_2012)
The variables are described in the package documentation ?gto_2012
.
The functions select_sample_str
and select_sample_prop
allow for simple
random sampling and stratified random sampling. For example to
select a simple random sample with 2% of the data:
gto_srs <- select_sample_prop(gto_2012, frac = 0.02)
For stratified sample we can specify the desired sample size in each stratum or we can specify the percentage of the observations to be selected within each stratum (corresponding to stratified random sampling with proportional allocation).
If we are to specify sample size within strata we need to supply a data.frame
with the sample sizes. For example, lets suppose we are stratifying by federal
district and the data.frame allo_df
specifies the allocation in each stratum,
in this case we want to sample 3 polling stations in each stratum:
allo_df <- data.frame(distrito_fed_17 = 1:20, n = rep(3, 20)) allo_df
We use the function select_sample_str
:
gto_equal <- select_sample_str(gto_2012, allo_df, n, distrito_fed_17) table(gto_equal$distrito_fed_17)
And selecting with proportional allocation we can choose a stratified sample with 2% of the polling stations:
gto_str <- select_sample_prop(gto_2012, distrito_fed_17, 0.02, seed = 281982) table(gto_str$distrito_fed_17)
Note that there is a parameter seed
so we can replicate a sample.
We begin exemplifying ratio estimation, to compute estimations using ratio estimator we need to know the size of each strata (in this case strata are the local districts):
# count number of polling stations per stratum gto_stratum_sizes <- gto_2012 %>% dplyr::group_by(distrito_loc_17) %>% dplyr::summarise(n_stratum = n())
Now lets suppose we have a stratified random sample with 4% of the data (pps):
gto_sample <- select_sample_prop(gto_stratum_sizes, stratum = distrito_loc_17, 0.06, seed = 19291)
We then call the function ratio_estimation()
ratio_estimation(gto_sample, stratum = distrito_loc_17, n_stratum = n_stratum, ... = pri_pvem:otros)
For Mr. P, we fit a model per candidate and then compute the proportions using
the simulated counts for each polling station-candidate. The following
function selects a sample of 6% percent of the data, fits the model using the
sample and predicts for the population. Alternatively, one can select a sample
before calling the function, and use the parameter frac_sample=1
, however,
for the estimation to take place the data
must include all polling stations
and NA whenever the polling station was not in the sample.
mrp_gto <- mrp_estimation(gto_2012, pri_pvem:otros, frac = 0.06, stratum = distrito_loc_17, n_iter = 2000, n_burnin = 1500, n_chains = 2, seed = 19291, parallel = TRUE, mc_cores = 8)
It is worth noting that parallelization is not available in Windows.
mrp_estimation
returns a list with the fitted models, so we can evaluate
examine the model and evaluate convergence:
mrp_gto$jags_fits$pri_pvem$model plot(mrp_gto$jags_fits$pan_na)
It also includes a summary table with posterior estimations:
mrp_gto$post_summary
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.