matching: Predicts unknown responses by matching

Description Usage Arguments Details Value References Examples

View source: R/NonProbEst.R

Description

It uses the matching method introduced by Rivers (2007). The idea is to model the relationship between y_k and x_k using the convenience sample in order to predict y_k for the reference sample. You can then predict the total using the 'total_estimation' method.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
matching(
  convenience_sample,
  reference_sample,
  covariates,
  estimated_var,
  positive_label = NULL,
  algorithm = "glm",
  proc = NULL,
  ...
)

Arguments

convenience_sample

Data frame containing the non-probabilistic sample.

reference_sample

Data frame containing the probabilistic sample.

covariates

String vector specifying the common variables to use for training.

estimated_var

String specifying the variable to estimate.

positive_label

String specifying the label to be considered positive if the estimated variable is categorical. Leave it as the default NULL otherwise.

algorithm

A string specifying which classification or regression model to use (same as caret's method).

proc

A string or vector of strings specifying if any of the data preprocessing techniques available in train function from 'caret' package should be applied to data prior to the propensity estimation. By default, its value is NULL and no preprocessing is applied.

...

Further parameters to be passed to the train function.

Details

Training of the models is done via the 'caret' package. The algorithm specified in algorithm must match one of the names in the list of algorithms supported by 'caret'. If the estimated variable is categorical, probabilities are returned.

Value

A vector containing the estimated responses for the reference sample.

References

Rivers, D. (2007). Sampling for Web Surveys. Presented in Joint Statistical Meetings, Salt Lake City, UT.

Examples

1
2
3
4
5
6
7
8
9
#Simple example with default parameters
N = 50000
covariates = c("education_primaria", "education_secundaria")
if (is.numeric(sampleNP$vote_gen))
   sampleNP$vote_gen = factor(sampleNP$vote_gen, c(0, 1), c('F', 'T'))
estimated_votes = data.frame(
   vote_gen = matching(sampleNP, sampleP, covariates, "vote_gen", 'T')
)
total_estimation(estimated_votes, N / nrow(estimated_votes), c("vote_gen"), N)

Example output

Loading required package: lattice
Loading required package: ggplot2
vote_gen 
 2418869 

NonProbEst documentation built on July 1, 2020, 6:08 p.m.