| dist_mixed | R Documentation |
Internal helper function to compute pairwise dissimilarities for datasets containing a mix of continuous, binary, and categorical variables using Gower's method \insertCitegower1971generaldbrobust.
dist_mixed(
x,
continuous_cols = NULL,
binary_cols = NULL,
categorical_cols = NULL,
binary_asym = FALSE
)
x |
A data frame with rows as observations and columns as variables. |
continuous_cols |
Optional numeric indices or column names for continuous variables. |
binary_cols |
Optional numeric indices or column names for binary variables. |
categorical_cols |
Optional numeric indices or column names for categorical/multiclass variables. |
binary_asym |
Logical; if TRUE, binary variables are treated as asymmetric (only 1/1 counts as match). |
Continuous, binary, and categorical columns can be automatically detected,
or explicitly specified by the user via continuous_cols, binary_cols,
and categorical_cols.
Continuous, binary, and categorical columns are combined into a single dissimilarity measure following Gower's approach.
Continuous variables are scaled by their range.
Binary variables can be treated as symmetric (0/0 and 1/1 count as match) or asymmetric (only 1/1 counts as match).
Categorical variables are compared using simple matching.
Missing values are ignored pairwise.
Advantages:
Low computational cost.
Works naturally with mixed-type data.
Limitations:
Neglects potential correlations among quantitative variables.
Sensitive to outliers, which can affect robustness.
May overemphasize categorical differences in mixed-data settings.
A symmetric numeric matrix of pairwise dissimilarities in [0,1].
gower1971generaldbrobust
# Small example: Compute classical Gower for a simulated data frame
df <- data.frame(
height = c(170, 160, 180),
gender = factor(c("M", "F", "M")),
smoker = c(1, 0, 1)
)
# Compute Gower dissimilarities automatically detecting types
dbrobust::dist_mixed(df)
# Manual type specification
cont_cols <- "height"
cat_cols <- NULL
bin_cols <- c("gender","smoker")
dbrobust::dist_mixed(
x = df,
continuous_cols = cont_cols,
categorical_cols = cat_cols,
binary_cols = bin_cols
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.