makedata: Synthetic data generation for the basic unit-level SAE model...
In rsae: Robust Small Area Estimation

Description Usage Arguments Details Value Author(s) References Examples

This function serves for synthetically generating data with area-level variation. It has been written to test several estimating methods. In addition, one may introduce contamination to the laws of the model- and/or random effects (see Details, below).

1
2
3

makedata(seed=1024, intercept=1, beta=1, n=4, g=20, areaID=NULL,
         ve=1, ve.contam=41, ve.epsilon=0, vu=1, vu.contam=41,
         vu.epsilon=0)

`seed`	an integer, defining the `set.seed` (default `seed=1024`)
`intercept`	either a scalar as intercept of the fixed-effects model or `NULL` (default: `intercept=1`)
`beta`	scalar or vector defining the fixed-effect coefficients (default: `beta=1`). For each given coefficient, a vector of realizations is drawn from the standard normal distribution.
`n`	integer, defining the number of units per area in balanced-data setups (default: `n=4`)
`g`	integer, defining the number of areas (default: `g=20`)
`areaID`	by default `areaID=NULL`. If one attempts to generate synthetic unbalanced data, one may call `makedata` with a vector, the elements of which area identifiers. This vector should contain a series of (integer valued) area IDs. The number of areas is set equal to the number unique IDs; see the `rsae` Vignette for more details.
`ve`	scalar, defining the model/ residual variance
`ve.contam`	scalar, defining the model variance of the outlier part in a mixture distribution (Tuckey-Huber-type contamination model). e = (1-h)N(0, ve) + hN(0, ve.contam)
`ve.epsilon`	scalar, defining the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval)
`vu`	scalar, defining the (area-level) random-effect variance
`vu.contam`	scalar, defining the (area-level) random-effect variance of the outlier part in the contamination mixture distribution (cf., `ve.contam`)
`vu.epsilon`	scalar, defining the relative number of outliers in the contamination mixture distribution of the (area-level) random effects (cf., `ve.epsilon`)

The function makedata generates synthetic datasets that may be used to study the behavior of different estimating methods. Let y_i denote an area-specific n_i-vector of the response variable for the areas i=1,...,g. Define a (n_i \times p)-matrix X_i of realizations from the std. normal distribution, N(0,1), and let β denote a p-vector of regression coefficients. Now, the y_i are drawn using the law y_i \sim N(X_iβ, v_e I_i + v_u J_i) with v_e and v_u the variances of the model error and random-effect variance, respectively, and I_i and J_i denoting the identity matrix and matrix of ones, respectively.

In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of e_{i,j} and u_i are replaced by the Tukey-Huber contamination mixture:

e_{i,j} \sim (1-ε^{ve})N(0,v_e) + ε^{ve}N(0, v_e^{ε}),
u_{i} \sim (1-ε^{vu})N(0,v_u) + ε^{vu}N(0, v_u^{ε}),

where ε^{ve} and ε^{vu} regulate the degree of contamination; v_e^{ε} and v_e^{ε} define the variance of the contamination part of the mixture distribution.

Four different contamination setups are possible:

no contamination (i.e., ve.epsilon=vu.epsilon=0),
contaminated model error (i.e., ve.epsilon!=0 and vu.epsilon=0),
contaminated random effect (i.e., ve.epsilon=0 and vu.epsilon!=0),
both are conaminated (i.e., ve.epsilon!=0 and vu.epsilon!=0).

Instance of the class saemodel.

Tobas Schoch

Stahel, W.A. and A. Welsh (1997): Approaches to robust estimation in the simplest variance components model, Journal of Inference and Statistical Planning 57, pp. 295-319.