Description Usage Arguments Details Value References See Also Examples
The function ellsae
implements the "ELL-method" method for
small area estimation by Elbers, C., Lanjouw, J. O. and Lanjouw, P (2003)
used to impute a missing variable from a smaller survey dataset into a
census. The imputation is based on a linear model and bootstrap samples
1 2 3 4 |
model |
a model that describes the relationship between the response and
the explanatory variables. Input must be a linear model that can be
processed by |
survey |
data.table with the response variable of interest included. Will be used to estimate the linear model. Input will be coerced to a data.table |
census |
data.table where the variable of interest is missing and shall be imputed |
location_survey |
string with the name of the variable in the survey data set that contains information about the cluster (= location) of an observation |
n_boot |
integer indicating the size of the bootstrap sample |
seed |
integer, seed can be set to obtain reproducible results |
welfare.function |
function that transforms the bootstrapped variable of interested to obtain some welfare estimate |
transfy |
function to transform the response y in the model |
transfy_inv |
inverse function of |
output |
character string or character vector. Either "default", "all", or a vector with one or more of the following elements: c("summary", "yboot", "model_fit", "bootsample", "survey", "census") |
cores |
either a string, "auto", or an integer value indicating the number of cores to be used for the estimation. |
quantiles |
vector of requested quantiles for the |
clustermeans |
character vector with names of variables present in both data sets. The mean of those variables in the census will be computed by location and added to the survey data set before estimation of the linear model. This may enhance precision of the estimates |
location_census |
string with the name of the variable in the survey data
set that contains information about the cluster (= location) of an
observation. Only needed if |
save_boot |
logical value. TRUE saves the bootstrap sample as BootstrapSampleELLsae-DATE.csv in the current working direktory. |
weights=NULL |
weights than can be used for fitting the model |
The function takes the survey data set and uses the argument
model
to estimate a linear model of the type lm()
. In case the
argument clustermeans
is specified, means from the census data for
the given variables are calculated and merged with the survey data by
cluster locations. These new explanatory variables are also used for the
estimation of the linear model. Rows with NA's are omitted from the
computation.
The user may choose to transform the response variable using a function,
transfy
, previous to estimating the model. This function will be
directly applied to the entire vector of the response variable, i.e.
transfy(response)
. This means the specified function needs to be able
to take a vector as input. For transformations like log
, exp
,
sqrt
this will just yield an element-wise transformation. For more
complex transformation, you may want to use sapply
inside your
function, to ensure element-wise transformation. This also applies to
transfy_inv
, and welfare.function
which need to be able to
take a matrix as input. In many cases a transformation like transfy
could also be achieved by altering the specified model appropriately, but
using transfy
and transfy_inv
is the recommended usage.
From the regression, location effects are calculated as the mean by location
of the regression residuals. Individual random error terms are then obtained
as the difference between the regression residuals and the location effects.
The bootstrapped response variables are generated using three sources of
randomness. The betas obtained from lm()
are replaced by draws from a
multivariate normal distribution. In addition random location effects and
residuals are drawn with replacement. Internally the sample is a matrix,
bootstrap
, with the rows corresponding to bootstrap samples for one
individual observation in the census data set.
If transfy_inv
was specified, the bootstrap sample is transformed
back. This function will be directly applied to the matrix of bootstrap
samples, i.e. transfy_inv(bootstrap)
.
If a welfare function was specified it will be used to transform the
bootstrap sample. It will be diretly applied to the matrix of bootstrap
samples, i.e. welfare.function(bootstrap)
. Bootstrap samples that
belong to one observation are arranged row-wise.
cores
specifies the number of cores to use for the calculation. As
parallelization is done in C++ and incurs little overhead this should in
most cases be left to "auto".
To obtain reproducicble results, a seed
can be specified. Simply
running set.seed()
in R does not work. Providing a seed will not
permanently alter the seed in R.
ellsae
returns a list. By default, this list included a matrix
with basic summary statistics as specified in quantiles
, a vector
with the means of the bootstrap samples for every observation, and the
lm
-object obtained from the linear model estimation. In addition, the
user can request the full matrix of bootstrap samples, and an updated
data.table of the survey and census data set with residuals and location
effects and clustermeans added.
Elbers, C., Lanjouw, J. O. and Lanjouw, P. (2003). Micro-Level Estimation of Poverty and Inequality. In: Econometrica 71.1, pp. 355-364, Jan 2003
Guadarrama Sanz, M., Molina, I., and Rao, J.N.K. (2016). A comparison of small area estimation methods for poverty mapping. In: 17 (Mar. 2016), 41-66 and 156 and 158.
If issues with memory allocation occur one, can also use
ellsae_big
instead.Other small area
estimation methods can also be found in the package
sae
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## Not run:
# Generate a sample survey and census data from the provided brazil data set
brazil <- ELLsae::brazil
helper <- sample(x = 1:nrow(brazil), size = nrow(brazil)/5, replace = FALSE)
helper <- sort(helper)
survey <- brazil[helper,]
census <- brazil[-helper,]
model.example <- hh_inc ~ geo2_br + age + sex + computer + trash
ELLsae::ellsae(model = model.example,
survey = survey,
census = census,
location_survey = "geo2_br",
n_boot = 250L,
seed = 1234,
transfy = log,
transfy_inv = exp,
output = "all",
cores = "auto",
quantiles = c(0, 0.25, 0.5, 0.75, 1),
clustermeans = "age",
location_census = "geo2_br",
save_boot = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.