knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(surveyweights) # example data: mydata<-data.frame(somevar1=c(1:5), somevar2=c("one","more","test","variable",NA), strata_names=c("a","a","b","b","a"), cluster_names=c("x","y","z","z","y")) # example sampling frame: mysf<-data.frame(strata=c("a","b"),pops=c(100,1000)) clustersf<-data.frame(strata=c("x","y","z"),pops=c(50,100,200))
Example data:
mydata
Example sampling frame:
mysf
Note that the values in mydata$strata_names
all appear exactly in mysf$strata
.
First we generate a weighting function that matches our dataset and samplingframe. Then we use that function to calculate weights for our dataset and/or different subsets of it.
To generate a weight function, we need to provide at least:
So we can do:
myweighting<-weighting_fun_from_samplingframe(sampling.frame = mysf, data.stratum.column = "strata_names", sampling.frame.population.column = "pops", sampling.frame.stratum.column = "strata")
weighting
are not the weights itself, but a function to generate weights.
Now to get the actual weights:
myweighting(mydata)
This is useful because we can calculate weights for subsets on the fly:
# remove NA records: mydata_subset<-mydata[!is.na(mydata$somevar2),] # there are now only 2 records in stratum "a": mydata_subset # so the weights are different: myweighting(mydata_subset)
Assume that in addition to the above, we have a second sampling frame with populations sizes for example for cluster inside the strata:
clustersf
As above, we generate a weighting function for the cluster:
myweighting_cluster<-weighting_fun_from_samplingframe(sampling.frame = clustersf, data.stratum.column = "cluster_names", sampling.frame.population.column = "pops", sampling.frame.stratum.column = "strata")
The weights we get with myweighting_cluster
are incorrect, because the cluster between strata are not proportional. We can combine the two weighting functions like this:
myweighting_combined<-combine_weighting_functions(myweighting, myweighting_cluster) myweighting_combined(mydata) # in comparison: the cluster weights only: myweighting_cluster(mydata) # This of course still works for subsets: myweighting_cluster(mydata_subset)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.