| kcmeans | R Documentation |
Implementation of the K-Conditional-Means estimator.
kcmeans(y, X, which_is_cat = 1, K = 2)
y |
The outcome variable, a numerical vector. |
X |
A (sparse) feature matrix where one column is the categorical predictor. |
which_is_cat |
An integer indicating which column of |
K |
The number of support points, an integer greater than 2. |
kcmeans returns an object of S3 class kcmeans. An
object of class kcmeans is a list containing the following
components:
cluster_mapA matrix that characterizes the estimated
predictor of the residualized outcome
\tilde{Y} \equiv Y - X_{2:}^\top \hat{\pi}. The first column
x denotes the value of the categorical variable that
corresponds to the unrestricted sample mean mean_x of
\tilde{Y}, the sample share p_x, the estimated
cluster cluster_x, and the estimated restricted sample mean
mean_xK of \tilde{Y} with just K support
points.
mean_yThe unconditional sample mean of
\tilde{Y}.
piThe best linear prediction coefficients of Y
on X corresponding to the non-categorical predictors
X_{2:}.
which_is_cat,KPassthrough of user-provided arguments. See above for details.
Wang H and Song M (2011). "Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming." The R Journal 3(2), 29–33.
Wiemann T (2023). "Optimal Categorical Instruments." https://arxiv.org/abs/2311.17021
# Simulate simple dataset with n=800 observations
X <- rnorm(800) # continuous predictor
Z <- sample(1:20, 800, replace = TRUE) # categorical predictor
Z0 <- Z %% 4 # lower-dimensional latent categorical variable
y <- Z0 + X + rnorm(800) # outcome
# Compute kcmeans with four support points
kcmeans_fit <- kcmeans(y, cbind(Z, X), K = 4)
# Print the estimated support points of the categorical predictor
print(unique(kcmeans_fit$cluster_map[, "mean_xK"]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.