simEUSILC: Simulate EU-SILC population data

View source: R/simEUSILC.R

simEUSILCR Documentation

Simulate EU-SILC population data

Description

Simulate population data for the European Statistics on Income and Living Conditions (EU-SILC).

Usage

simEUSILC(
  dataS,
  hid = "db030",
  wh = "db090",
  wp = "rb050",
  hsize = NULL,
  strata = "db040",
  pid = NULL,
  age = "age",
  gender = "rb090",
  categorizeAge = TRUE,
  breaksAge = NULL,
  categorical = c("pl030", "pb220a"),
  income = "netIncome",
  method = c("multinom", "twostep"),
  breaks = NULL,
  lower = NULL,
  upper = NULL,
  equidist = TRUE,
  probs = NULL,
  gpd = TRUE,
  threshold = NULL,
  est = "moments",
  const = NULL,
  alpha = 0.01,
  residuals = TRUE,
  components = c("py010n", "py050n", "py090n", "py100n", "py110n", "py120n", "py130n",
    "py140n"),
  conditional = c(getCatName(income), "pl030"),
  keep = TRUE,
  maxit = 500,
  MaxNWts = 1500,
  tol = .Machine$double.eps^0.5,
  nr_cpus = NULL,
  seed
)

Arguments

dataS

a data.frame containing EU-SILC survey data.

hid

a character string specifying the column of dataS that contains the household ID.

wh

a character string specifying the column of dataS that contains the household sample weights.

wp

a character string specifying the column of dataS that contains the personal sample weights.

hsize

an optional character string specifying a column of dataS that contains the household size. If NULL, the household sizes are computed.

strata

a character string specifying the column of dataS that define strata. Note that this is currently a required argument and only one stratification variable is supported.

pid

an optional character string specifying a column of dataS that contains the personal ID.

age

a character string specifying the column of dataS that contains the age of the persons (to be used for setting up the household structure).

gender

a character string specifying the column of dataS that contains the gender of the persons (to be used for setting up the household structure).

categorizeAge

a logical indicating whether age categories should be used for simulating additional categorical and continuous variables to decrease computation time.

breaksAge

numeric; if categorizeAge is TRUE, an optional vector of two or more break points for constructing age categories, otherwise ignored.

categorical

a character vector specifying additional categorical variables of dataS that should be simulated for the population data.

income

a character string specifying the variable of dataS that contains the personal income (to be simulated for the population data).

method

a character string specifying the method to be used for simulating personal income. Accepted values are "multinom" (for using multinomial log-linear models combined with random draws from the resulting ategories) and "twostep" (for using two-step regression models combined with random error terms).

breaks

if method is "multinom", an optional numeric vector of two or more break points for categorizing the personal income. If missing, break points are computed using weighted quantiles.

lower, upper

numeric values; if method is "multinom" and breaks is NULL, these can be used to specify lower and upper bounds other than minimum and maximum, respectively. Note that if gpd is TRUE (see below), upper defaults to Inf.

equidist

logical; if method is "multinom" and breaks is NULL, this indicates whether the (positive) default break points should be equidistant or whether there should be refinements in the lower and upper tail (see getBreaks).

probs

numeric vector with values in [0, 1]; if method is "multinom" and breaks is NULL, this gives probabilities for quantiles to be used as (positive) break points. If supplied, this is preferred over equidist.

gpd

logical; if method is "multinom", this indicates whether the upper tail of the personal income should be simulated by random draws from a (truncated) generalized Pareto distribution rather than a uniform distribution.

threshold

a numeric value; if method is "multinom", values for categories above threshold are drawn from a (truncated) generalized Pareto distribution.

est

a character string; if method is "multinom", the estimator to be used to fit the generalized Pareto distribution.

const

numeric; if method is "twostep", this gives a constant to be added before log transformation.

alpha

numeric; if method is "twostep", this gives trimming parameters for the sample data. Trimming is thereby done with respect to the variable specified by additional. If a numeric vector of length two is supplied, the first element gives the trimming proportion for the lower part and the second element the trimming proportion for the upper part. If a single numeric is supplied, it is used for both. With NULL, trimming is suppressed.

residuals

logical; if method is "twostep", this indicates whether the random error terms should be obtained by draws from the residuals. If FALSE, they are drawn from a normal distribution (median and MAD of the residuals are used as parameters).

components

a character vector specifying the income components in dataS (to be simulated for the population data).

conditional

an optional character vector specifying categorical contitioning variables for resampling of the income components. The fractions occurring in dataS are then drawn from the respective subsets defined by these variables.

keep

a logical indicating whether variables computed internally in the procedure (such as the original IDs of the corresponding households in the underlying sample, age categories or income categories) should be stored in the resulting population data.

maxit, MaxNWts

control parameters to be passed to multinom and nnet. See the help file for nnet.

tol

if method is "twostep", a small positive numeric value or NULL (see simContinuous).

nr_cpus

if specified, an integer number defining the number of cpus that should be used for parallel processing.

seed

optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored.

Value

An object of class simPopObj containing the simulated EU-SILC population data as well as the underlying sample.

Note

This is a wrapper calling simStructure, simCategorical, simContinuous and simComponents.

Author(s)

Andreas Alfons and Stefan Kraft and Bernhard Meindl

See Also

simStructure, simCategorical, simContinuous, simComponents

Examples


data(eusilcS) # load sample data

## Not run: 
## long computation time
# multinomial model with random draws
eusilcM <- simEUSILC(eusilcS, upper = 200000, equidist = FALSE
, nr_cpus = 1)
summary(eusilcM)

# two-step regression
eusilcT <- simEUSILC(eusilcS, method = "twostep", nr_cpus = 1)
summary(eusilcT)

## End(Not run)


simPop documentation built on Nov. 10, 2022, 5:43 p.m.