Description Usage Arguments Details Value Author(s) References Examples
This function serves for synthetically generating data with area-level variation. It has been written to test several estimating methods. In addition, one may introduce contamination to the laws of the model- and/or random effects (see Details, below).
1 2 3 |
seed |
an integer, defining the |
intercept |
either a scalar as intercept of the fixed-effects model or |
beta |
scalar or vector defining the fixed-effect coefficients (default: |
n |
integer, defining the number of units per area in balanced-data setups (default: |
g |
integer, defining the number of areas (default: |
areaID |
by default |
ve |
scalar, defining the model/ residual variance |
ve.contam |
scalar, defining the model variance of the outlier part in a mixture distribution (Tuckey-Huber-type contamination model). e = (1-h)*N(0, ve) + h*N(0, ve.contam) |
ve.epsilon |
scalar, defining the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval) |
vu |
scalar, defining the (area-level) random-effect variance |
vu.contam |
scalar, defining the (area-level) random-effect variance of the outlier part in the contamination mixture distribution (cf., |
vu.epsilon |
scalar, defining the relative number of outliers in the contamination mixture distribution of the (area-level) random effects (cf., |
The function makedata
generates synthetic datasets that may be used to study the behavior of different estimating methods. Let y_i denote an area-specific n_i-vector of the response variable for the areas i=1,...,g. Define a (n_i \times p)-matrix X_i of realizations from the std. normal distribution, N(0,1), and let β denote a p-vector of regression coefficients. Now, the y_i are drawn using the law y_i \sim N(X_iβ, v_e I_i + v_u J_i) with v_e and v_u the variances of the model error and random-effect variance, respectively, and I_i and J_i denoting the identity matrix and matrix of ones, respectively.
In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of e_{i,j} and u_i are replaced by the Tukey-Huber contamination mixture:
e_{i,j} \sim (1-ε^{ve})N(0,v_e) + ε^{ve}N(0, v_e^{ε}),
u_{i} \sim (1-ε^{vu})N(0,v_u) + ε^{vu}N(0, v_u^{ε}),
where ε^{ve} and ε^{vu} regulate the degree of contamination; v_e^{ε} and v_e^{ε} define the variance of the contamination part of the mixture distribution.
Four different contamination setups are possible:
no contamination (i.e., ve.epsilon=vu.epsilon=0
),
contaminated model error (i.e., ve.epsilon!=0
and vu.epsilon=0
),
contaminated random effect (i.e., ve.epsilon=0
and vu.epsilon!=0
),
both are conaminated (i.e., ve.epsilon!=0
and vu.epsilon!=0
).
Instance of the class saemodel
.
Tobas Schoch
Stahel, W.A. and A. Welsh (1997): Approaches to robust estimation in the simplest variance components model, Journal of Inference and Statistical Planning 57, pp. 295-319.
1 2 | #generate synthetic data
mymodel <- makedata()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.