| Data_MC_no_contamination | R Documentation |
Synthetic dataset generated from a multivariate normal distribution with
moderate correlation structure (\rho = 0.6). It contains 500 observations
and 10 variables of mixed type (continuous, categorical, binary, and weights).
No contaminated cases were added in this version, so the dataset represents
a clean scenario with 0% contamination. These data follow the design in
\insertCiteboj2024robustificationdbrobust.
Data_MC_no_contamination
A data frame with 500 rows and 10 variables:
Continuous variable 1
Continuous variable 2
Continuous variable 3
Continuous variable 4
Categorical variable 1 (3 categories, approx. balanced)
Categorical variable 2 (3 categories, approx. balanced)
Categorical variable 3 (4 categories, uniform distribution)
Binary variable 1 (40% zeros, 60% ones)
Binary variable 2 (60% zeros, 40% ones)
Observation weights derived from the joint distribution of V5 and V8, following a proportional frequency-based scheme.
Continuous variables were drawn directly from the multivariate normal sample.
Binary and categorical variables were obtained by discretizing normal margins using percentile-based thresholds.
Unlike other datasets in this collection, no artificial contamination was introduced here.
The weighting scheme prioritizes frequent category combinations.
boj2024robustificationdbrobust
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.