SurveyMap | R Documentation |
An R6 SurveyMap
object holds the mapping
between a set of items in a survey and a population dataset.
The label is the item label in each dataset and the values
is a list of all possible values. The values for the survey
and population must be aligned, i.e., the lists must have the
same number of elements and the values at index i in each list
are equivalent. If there is a meaningful ordering over the values,
they should be listed in that order, either descending or ascending.
new()
Create a new SurveyMap object
SurveyMap$new(sample, population, ...)
sample
The SurveyData object corresponding to the sample data.
population
The SurveyData object corresponding to the population data.
...
QuestionMap objects.
A SurveyMap
object.
print()
Print a summary of the mapping.
SurveyMap$print(...)
...
Currently ignored.
The SurveyMap
object, invisibly.
add()
Add new QuestionMaps.
SurveyMap$add(...)
...
The QuestionMaps to add.
The SurveyMap
object, invisibly.
delete()
Delete QuestionMaps.
SurveyMap$delete(...)
...
The QuestionMaps to delete.
The SurveyMap
object, invisibly.
replace()
Replace one QuestionMap with another.
SurveyMap$replace(old_question, new_question)
old_question
The QuestionMap object to replace.
new_question
The QuestionMap object to use instead.
The SurveyMap
object, invisibly.
validate()
Validate the mapping.
SurveyMap$validate()
The SurveyMap
object, invisibly.
mapping()
The mapping
method uses the given maps between questions
to create new sample and population data frames that have unified
variable names (e.g., if the underlying construct is called age
, both
sample and population will have an age
column, even if in the the raw
data both had different variable names).
This method also unifies the levels of each variable in the sample and
population so that the maximum set of consistent levels is created.
Names of these new levels are made according the the sample level
names. If multiple levels are combined, the new name will be the
existing levels separated by a +
(e.g. if age groups "18-25"
and
"26-30"
are combined the new name will be "18-25 + 26-30"
).
Use the mapped_sample_data
and mapped_population_data
methods to
access the resulting data frames.
SurveyMap$mapping()
The SurveyMap
object, invisibly.
tabulate()
Prepare the poststratification table. The resulting data
frame is available via the poststrat_data
method. See
Examples.
SurveyMap$tabulate(...)
...
The names of the variables to include as strings.
The SurveyMap
object, invisibly.
fit()
Fit a model. rstanarm, brms, and lme4 are supported natively. Custom modeling functions can also be used if they meet certain requirements.
SurveyMap$fit(fun, formula, ...)
fun
The model fitting function to use. For example,
fun=rstanarm::stan_glmer
, fun=brms::brm
, fun=lme4::glmer
. If
using a custom fun
it must have a formula
argument and a data
argument that accepts a data frame (like standard R modeling
functions). Other arguments can be passed via ...
. The formula
argument will be taken from the formula
argument below and the data
argument will be automatically set to the the mapped data created by
the mapping
method (you can access this data via the
mapped_sample_data
method).
formula
The model formula. Can be either a string or a formula object.
...
Arguments other than formula
and data
to pass to fun
.
A SurveyFit object.
item_map()
Access the item_map
.
SurveyMap$item_map()
A named list of QuestionMap
s.
sample()
Access the SurveyData
object containing the sample data.
SurveyMap$sample()
A SurveyData
object.
population()
Access the SurveyData
object containing the population data.
SurveyMap$population()
A SurveyData
object.
poststrat_data()
Access the poststratification data frame created by the tabulate
method.
SurveyMap$poststrat_data()
A data frame.
mapped_sample_data()
Access the data frame containing the mapped sample data
created by the mapping
method.
SurveyMap$mapped_sample_data(key = TRUE)
key
Should the .key
column be included? This column just
indicates the original order of the rows and is primarily intended
for internal use.
A data frame.
mapped_population_data()
Access the data frame containing the mapped population data
created by the mapping
method
SurveyMap$mapped_population_data(key = TRUE)
key
Should the .key
column be included? This column just
indicates the original order of the rows and is primarily intended
for internal use.
A data frame.
clone()
The objects of this class are cloneable with this method.
SurveyMap$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Some fake survey data for demonstration
head(shape_survey)
# Create SurveyData object for the sample
box_prefs <- SurveyData$new(
data = shape_survey,
questions = list(
age = "Please identify your age group",
gender = "Please select your gender",
vote_for = "Which party did you vote for in the 2018 election?",
y = "If today is the election day, would you vote for the Box Party?"
),
responses = list(
age = levels(shape_survey$age),
gender = levels(shape_survey$gender),
# Here we use a data frame for the responses because the levels
# in the data are abridged versions of the actual responses.
# This can be useful when surveys have brief/non descriptive responses.
vote_for = data.frame(
data = levels(shape_survey$vote_for),
asked = c("Box Party Faction A", "Box Party Faction B",
"Circle Party Coalition", "Circle Party")
),
y = c("no", "yes")
),
weights = "wt",
design = list(ids =~1)
)
box_prefs$print()
box_prefs$n_questions()
# Some fake population data for demonstration
head(approx_voters_popn)
# Create SurveyData object for the population
popn_obj <- SurveyData$new(
data = approx_voters_popn,
questions = list(
age_group = "Which age group are you?",
gender = "Gender?",
vote_pref = "Which party do you prefer to vote for?"
),
# order doesn't matter (gender before age here) because
# the list has the names of the variables
responses = list(
gender = levels(approx_voters_popn$gender),
age_group = levels(approx_voters_popn$age_group),
vote_pref = levels(approx_voters_popn$vote_pref)
),
weights = "wt"
)
popn_obj$print()
# Create the QuestionMap objects mapping each question between the
# survey and population dataset
q_age <- QuestionMap$new(
name = "age",
col_names = c("age","age_group"),
values_map = list(
"18-25" = "18-35", "26-35" = "18-35","36-45" = "36-55",
"46-55" = "36-55", "56-65" = "56-65", "66-75" = "66+", "76-90" = "66+"
)
)
print(q_age)
q_party_pref <- QuestionMap$new(
name = "party_pref",
col_names = c("vote_for","vote_pref"),
values_map = list("Box Party" = "BP", "BP" = "BP","Circle Party" = "CP", "CP" = "CP")
)
q_gender <- QuestionMap$new(
name = "gender",
col_names = c("gender", "gender"),
values_map = list("male" = "m","female" = "f", "nonbinary" = "nb")
)
# Create SurveyMap object adding all questions at once
ex_map <- SurveyMap$new(
sample = box_prefs,
population = popn_obj,
q_age,
q_party_pref,
q_gender
)
print(ex_map) # or ex_map$print()
# Or can add questions incrementally
ex_map <- SurveyMap$new(sample = box_prefs, population = popn_obj)
print(ex_map)
ex_map$add(q_age, q_party_pref)
print(ex_map)
ex_map$add(q_gender)
print(ex_map)
# Create the mapping between sample and population
ex_map$mapping()
# Create the poststratification data frame using all variables in the mapping
# (alternatively, can specify particular variables, e.g. tabulate("age"))
ex_map$tabulate()
# Take a peak at the poststrat data frame
head(ex_map$poststrat_data())
## Not run:
# Fit regression model using rstanarm (returns a SurveyFit object)
fit_1 <- ex_map$fit(
fun = rstanarm::stan_glmer,
formula = y ~ (1|age) + (1|gender),
family = "binomial",
seed = 1111,
chains = 1, # just to keep the example fast and small
refresh = 0 # suppress printed sampling iteration updates
)
# To use lme4 or brms instead of rstanarm you would use:
# Example lme4 usage
# fit_2 <- ex_map$fit(
# fun = lme4::glmer,
# formula = y ~ (1|age) + (1|gender),
# family = "binomial"
# )
# Example brms usage
# fit_3 <- ex_map$fit(
# fun = brms::brm,
# formula = y ~ (1|age) + (1|gender),
# family = "bernoulli",
# seed = 1111
# )
# Predicted probabilities
# returns matrix with rows for poststrat cells, cols for posterior draws
poststrat_estimates <- fit_1$population_predict()
# Compute and summarize estimates by age level and party preference
estimates_by_age <- fit_1$aggregate(poststrat_estimates, by = "age")
estimates_by_party <- fit_1$aggregate(poststrat_estimates, by = "party_pref")
fit_1$summary(estimates_by_age)
fit_1$summary(estimates_by_party)
# Plot estimates
fit_1$plot(estimates_by_party)
fit_1$plot(estimates_by_age)
fit_1$plot(estimates_by_age, additional_stats = "none")
fit_1$plot(estimates_by_age, additional_stats = "wtd")
fit_1$plot(estimates_by_age, additional_stats = "raw")
fit_1$plot(estimates_by_age, additional_stats = c("wtd","raw","mrp"))
# Compute and summarize the population estimate
estimates_popn <- fit_1$aggregate(poststrat_estimates)
fit_1$summary(estimates_popn)
# Plot population estimate
fit_1$plot(estimates_popn)
fit_1$plot(estimates_popn, additional_stats = "none")
fit_1$plot(estimates_popn, additional_stats = "wtd")
fit_1$plot(estimates_popn, additional_stats = "raw")
fit_1$plot(estimates_popn, additional_stats = c("wtd","raw","mrp"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.