balanced_desagregated_kmean: balanced_desagregated_kmean
In 1Edtrujillo1/udeploy: Back-end & Front-end functions

Create the k-means label, having in count a balanced datased based on a unique id and a particular variable.

balanced_desagregated_kmean(
  df,
  var_model,
  ref_desgte,
  id,
  k,
  pre_cleaning = FALSE
)

`df`	dataset to create the labels
`var_model`	reference variable of balance
`ref_desgte`	reference desagregation variable
`id`	id variable reference of balance
`k`	number of desire clusters
`pre_cleaning`	clean or not previous to apply the k-means model

If the user select True on the pre_cleaning then the function will clean the dataset as preparation to obtain the k-means labels as a new dataset variable.
Standarize the variable var_model according to the levels of ref_desgte
Desagregate the dataset for all the observations based on the variable ref_desgte with frequency greater to 1, taking in mind that the id variable needs to be unique.
Creating the LABEL variable as the optimal k-mean labels for the dataset.
Optimize generated k-labels for desagregated dataset based on the id variable.
Improve the dataset labels based on a balanced grouped sum of var_model changing the labels that are upper the mean of the previous to the lower labels.
Improve the dataset labels based on a balanced grouped sum of var_model changing the labels that are out of the range population mean +- 1 standard deviation in the same way as 6.
Again optmize generated k-labels for desagregated datased based on the id variable to ensure that each id has different label.

dataset df with a new created variable "LABEL" with levels up to k that balance the whole dataset where:

If pre_cleaning == TRUE, is going to clean the dataset before applying the k-means model
If pre_cleaning == FALSE, is going to apply the k-means model to the provided dataset

Eduardo Trujillo

1Edtrujillo1/udeploy documentation built on July 13, 2021, 9:12 p.m.