makedata: Synthetic data generation for the basic unit-level SAE model...

Description Usage Arguments Details Value Author(s) References Examples

Description

This function serves for synthetically generating data with area-level variation. It has been written to test several estimating methods. In addition, one may introduce contamination to the laws of the model- and/or random effects (see Details, below).

Usage

1
2
3
makedata(seed=1024, intercept=1, beta=1, n=4, g=20, areaID=NULL,
         ve=1, ve.contam=41, ve.epsilon=0, vu=1, vu.contam=41,
         vu.epsilon=0)

Arguments

seed

an integer, defining the set.seed (default seed=1024)

intercept

either a scalar as intercept of the fixed-effects model or NULL (default: intercept=1)

beta

scalar or vector defining the fixed-effect coefficients (default: beta=1). For each given coefficient, a vector of realizations is drawn from the standard normal distribution.

n

integer, defining the number of units per area in balanced-data setups (default: n=4)

g

integer, defining the number of areas (default: g=20)

areaID

by default areaID=NULL. If one attempts to generate synthetic unbalanced data, one may call makedata with a vector, the elements of which area identifiers. This vector should contain a series of (integer valued) area IDs. The number of areas is set equal to the number unique IDs; see the rsae Vignette for more details.

ve

scalar, defining the model/ residual variance

ve.contam

scalar, defining the model variance of the outlier part in a mixture distribution (Tuckey-Huber-type contamination model). e = (1-h)*N(0, ve) + h*N(0, ve.contam)

ve.epsilon

scalar, defining the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval)

vu

scalar, defining the (area-level) random-effect variance

vu.contam

scalar, defining the (area-level) random-effect variance of the outlier part in the contamination mixture distribution (cf., ve.contam)

vu.epsilon

scalar, defining the relative number of outliers in the contamination mixture distribution of the (area-level) random effects (cf., ve.epsilon)

Details

The function makedata generates synthetic datasets that may be used to study the behavior of different estimating methods. Let y_i denote an area-specific n_i-vector of the response variable for the areas i=1,...,g. Define a (n_i \times p)-matrix X_i of realizations from the std. normal distribution, N(0,1), and let β denote a p-vector of regression coefficients. Now, the y_i are drawn using the law y_i \sim N(X_iβ, v_e I_i + v_u J_i) with v_e and v_u the variances of the model error and random-effect variance, respectively, and I_i and J_i denoting the identity matrix and matrix of ones, respectively.

In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of e_{i,j} and u_i are replaced by the Tukey-Huber contamination mixture:

where ε^{ve} and ε^{vu} regulate the degree of contamination; v_e^{ε} and v_e^{ε} define the variance of the contamination part of the mixture distribution.

Four different contamination setups are possible:

Value

Instance of the class saemodel.

Author(s)

Tobas Schoch

References

Stahel, W.A. and A. Welsh (1997): Approaches to robust estimation in the simplest variance components model, Journal of Inference and Statistical Planning 57, pp. 295-319.

Examples

1
2
#generate synthetic data
mymodel <- makedata()

rsae documentation built on May 2, 2019, 5:50 p.m.