balanced_desagregated_kmean: balanced_desagregated_kmean

Description Usage Arguments Details Value Author(s) See Also

Description

Create the k-means label, having in count a balanced datased based on a unique id and a particular variable.

Usage

1
2
3
4
5
6
7
8
balanced_desagregated_kmean(
  df,
  var_model,
  ref_desgte,
  id,
  k,
  pre_cleaning = FALSE
)

Arguments

df

dataset to create the labels

var_model

reference variable of balance

ref_desgte

reference desagregation variable

id

id variable reference of balance

k

number of desire clusters

pre_cleaning

clean or not previous to apply the k-means model

Details

  1. If the user select True on the pre_cleaning then the function will clean the dataset as preparation to obtain the k-means labels as a new dataset variable.

  2. Standarize the variable var_model according to the levels of ref_desgte

  3. Desagregate the dataset for all the observations based on the variable ref_desgte with frequency greater to 1, taking in mind that the id variable needs to be unique.

  4. Creating the LABEL variable as the optimal k-mean labels for the dataset.

  5. Optimize generated k-labels for desagregated dataset based on the id variable.

  6. Improve the dataset labels based on a balanced grouped sum of var_model changing the labels that are upper the mean of the previous to the lower labels.

  7. Improve the dataset labels based on a balanced grouped sum of var_model changing the labels that are out of the range population mean +- 1 standard deviation in the same way as 6.

  8. Again optmize generated k-labels for desagregated datased based on the id variable to ensure that each id has different label.

Value

dataset df with a new created variable "LABEL" with levels up to k that balance the whole dataset where:

Author(s)

Eduardo Trujillo

See Also

k-means algorithm


1Edtrujillo1/udeploy documentation built on July 13, 2021, 9:12 p.m.