ClustImpute: K-means clustering with build-in missing data imputation
In ClustImpute: K-Means Clustering with Build-in Missing Data Imputation

Description Usage Arguments Value Examples

Clustering algorithm that produces a missing value imputation using on the go. The (local) imputation distribution is defined by the currently assigned cluster. The first draw is by random imputation.

ClustImpute(
  X,
  nr_cluster,
  nr_iter = 10,
  c_steps = 1,
  wf = default_wf,
  n_end = 10,
  seed_nr = 150519,
  assign_with_wf = TRUE,
  shrink_towards_global_mean = TRUE
)

`X`	Data frame with only numeric values or NAs
`nr_cluster`	Number of clusters
`nr_iter`	Iterations of procedure
`c_steps`	Number of clustering steps per iteration
`wf`	Weight function. Linear up to n_end by default. Used to shrink X towards zero or the global mean (default). See shrink_towards_global_mean
`n_end`	Steps until convergence of weight function to 1
`seed_nr`	Number for set.seed()
`assign_with_wf`	Default is TRUE. If set to False, then the weight function is only applied in the centroid computation, but ignored in the cluster assignment.
`shrink_towards_global_mean`	By default TRUE. The weight matrix w is applied on the difference of X from the global mean m, i.e, (x-m)*w+m

complete_data: Completed data without NAs
clusters: For each row of complete_data, the associated cluster
centroids: For each cluster, the coordinates of the centroids in tidy format
centroids_matrix: For each cluster, the coordinates of the centroids in matrix format
imp_values_mean: Mean of the imputed variables per draw
imp_values_sd: Standard deviation of the imputed variables per draw

# Random Dataset
set.seed(739)
n <- 750 # numer of points
nr_other_vars <- 2
mat <- matrix(rnorm(nr_other_vars*n),n,nr_other_vars)
me<-4 # mean
x <- c(rnorm(n/3,me/2,1),rnorm(2*n/3,-me/2,1))
y <- c(rnorm(n/3,0,1),rnorm(n/3,me,1),rnorm(n/3,-me,1))
dat <- cbind(mat,x,y)
dat<- as.data.frame(scale(dat)) # scaling

# Create NAs
dat_with_miss <- miss_sim(dat,p=.1,seed_nr=120)

# Run ClustImpute
res <- ClustImpute(dat_with_miss,nr_cluster=3)

# Plot complete data set and cluster assignment
ggplot2::ggplot(res$complete_data,ggplot2::aes(x,y,color=factor(res$clusters))) +
ggplot2::geom_point()

# View centroids
res$centroids