Simulate continuous variables of population data
Description
Simulate continuous variables of population data using multinomial loglinear models combined with random draws from the resulting categories or (twostep) regression models combined with random error terms. The household structure of the population data and any other categorical predictors need to be simulated beforehand.
Usage
1 2 3 4 5 6 7 8 9  simContinuous(simPopObj, additional = "netIncome", method = c("multinom",
"lm", "poisson"), zeros = TRUE, breaks = NULL, lower = NULL,
upper = NULL, equidist = TRUE, probs = NULL, gpd = TRUE,
threshold = NULL, est = "moments", limit = NULL, censor = NULL,
log = TRUE, const = NULL, alpha = 0.01, residuals = TRUE,
keep = TRUE, maxit = 500, MaxNWts = 1500,
tol = .Machine$double.eps^0.5, nr_cpus = NULL, eps = NULL,
regModel = "basic", byHousehold = NULL, imputeMissings = FALSE, seed,
verbose = FALSE, by = "strata")

Arguments
simPopObj 
a 
additional 
a character string specifying the additional continuous
variable of 
method 
a character string specifying the method to be used for
simulating the continuous variable. Accepted values are 
zeros 
a logical indicating whether the variable specified by

breaks 
an optional numeric vector; if multinomial models are
computed, this can be used to supply two or more break points for
categorizing the variable specified by 
lower, upper 
optional numeric values; if multinomial models are
computed and 
equidist 
logical; if 
probs 
numeric vector with values in [0, 1]; if 
gpd 
logical; if 
threshold 
a numeric value; if 
est 
a character string; if 
limit 
an optional named list of lists; if multinomial models are computed, this can be used to account for structural zeros. The names of the list components specify the predictor variables for which to limit the possible outcomes of the response. For each predictor, a list containing the possible outcomes of the response for each category of the predictor can be supplied. The probabilities of other outcomes conditional on combinations that contain the specified categories of the supplied predictors are set to 0. Currently, this is only implemented for more than two categories in the response. 
censor 
an optional named list of lists or 
log 
logical; if 
const 
numeric; if 
alpha 
numeric; if 
residuals 
logical; if 
keep 
logical; if multinomial models are computed, this indicates
whether the simulated categories should be stored as a variable in the
resulting population data. If 
maxit, MaxNWts 
control parameters to be passed to

tol 
if 
nr_cpus 
if specified, an integer number defining the number of cpus that should be used for parallel processing. 
eps 
a small positive numeric value, or 
regModel 
allows to specify the model that should be for the simulation of the additional continuous variable. The following choices are possible:

byHousehold 
if NULL, simulated values are used as is. If either 
imputeMissings 
if TRUE, missing values in variables that are used for the underlying model are imputed using hockdeck. 
seed 
optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored. 
verbose 
(logical) if 
by 
defining which variable to use as split up variable of the estimation. Defaults to the strata variable. 
Details
If method
is "lm"
, the behavior for twostep models is
described in the following.
If zeros
is TRUE
and log
is not TRUE
or the
variable specified by additional
does not contain negative values, a
loglinear model is used to predict whether an observation is zero or not.
Then a linear model is used to predict the nonzero values.
If zeros
is TRUE
, log
is TRUE
and const
is specified, again a loglinear model is used to predict whether an
observation is zero or not. In the linear model to predict the nonzero
values, const
is added to the variable specified by additional
before the logarithms are taken.
If zeros
is TRUE
, log
is TRUE
, const
is
NULL
and there are negative values, a multinomial loglinear model is
used to predict negative, zero and positive observations. Categories for the
negative values are thereby defined by breaks
. In the second step, a
linear model is used to predict the positive values and negative values are
drawn from uniform distributions in the respective classes.
If zeros
is FALSE
, log
is TRUE
and const
is NULL
, a twostep model is used if there are nonpositive values in
the variable specified by additional
. Whether a loglinear or a
multinomial loglinear model is used depends on the number of categories to
be used for the nonpositive values, as defined by breaks
. Again,
positive values are then predicted with a linear model and nonpositive
values are drawn from uniform distributions.
The number of cpus are selected automatically in the following manner. The number of cpus is equal the number of strata. However, if the number of cpus is less than the number of strata, the number of cpus  1 is used by default. This should be the best strategy, but the user can also overwrite this decision.
Value
An object of class simPopObj
containing survey
data as well as the simulated population data including the continuous
variable specified by additional
and possibly simulated categories
for the desired continous variable.
Note
The basic household structure and any other categorical predictors
need to be simulated beforehand with the functions
simStructure
and simCategorical
, respectively.
Author(s)
Bernhard Meindl and Andreas Alfons (based on code by Stefan Kraft)
See Also
simStructure
, simCategorical
,
simComponents
, simEUSILC
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  data(eusilcS)
## Not run:
## approx. 20 seconds computation time
inp < specifyInput(data=eusilcS, hhid="db030", hhsize="hsize", strata="db040", weight="db090")
simPop < simStructure(data=inp, method="direct",
basicHHvars=c("age", "rb090", "hsize", "pl030", "pb220a"))
regModel = ~rb090+hsize+pl030+pb220a
# multinomial model with random draws
eusilcM < simContinuous(simPop, additional="netIncome",
regModel = regModel,
upper=200000, equidist=FALSE, nr_cpus=1)
class(eusilcM)
## End(Not run)
## Not run:
# twostep regression
eusilcT < simContinuous(simPop, additional="netIncome",
regModel = "basic",
method = "lm")
class(eusilcT)
## End(Not run)
