| Data_MC_contamination | R Documentation |
Synthetic dataset generated from a multivariate normal distribution with
moderate correlation structure (\rho = 0.6). It contains 525 observations
and 10 variables of mixed type (continuous, categorical, binary, and weights).
The last 25 rows correspond to contaminated observations created by adding
perturbations equal to three times the standard deviation of each quantitative
variable to a subset of original units. This results in a controlled 5%
contamination level. These data follow the design in
\insertCiteboj2024robustificationdbrobust.
Data_MC_contamination
A data frame with 525 rows and 10 variables:
Continuous variable 1
Continuous variable 2
Continuous variable 3
Continuous variable 4
Categorical variable 1 (3 categories, approx. balanced)
Categorical variable 2 (3 categories, approx. balanced)
Categorical variable 3 (4 categories, uniform distribution)
Binary variable 1 (40% zeros, 60% ones)
Binary variable 2 (60% zeros, 40% ones)
Observation weights derived from the joint distribution of V5 and V8, following a proportional frequency-based scheme.
Continuous variables were drawn directly from the multivariate normal sample.
Binary and categorical variables were obtained by discretizing normal margins using percentile-based thresholds.
Contaminated observations (rows 501–525) were generated by perturbing original cases with fluctuations of 3 SD.
The weighting scheme prioritizes frequent category combinations.
boj2024robustificationdbrobust
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.