improve_kmeans_labels: improve_kmeans_labels

Description Usage Arguments Details Value Note Author(s)

View source: R/stats.R

Description

Optimize generated K-labels for desagregated dataset

Usage

1
improve_kmeans_labels(df, id, label, k)

Arguments

df

dataset to change labels.

id

dataset id variable reference of balance

label

k-means label variable

k

number of desire clusters

Details

  1. split the dataset by id testing unique elements on label. Spliting on unique and duplicated sublist of the list improve_kmeans_labels called df_splited.

  2. For the duplicated sublist, we split it by the label testing unique element on label in unique elements and duplicated elements.

  1. 3.From the duplicated sublist we take the first row of each sublist called to_modify and from the unique sublist we take a random sample of the same length from the duplicated one called uniq_modify. From that sublist we create a sublist of the k-mean labels called uniq_labels.

  2. 4.We modify to_modify based on the list of labels uniq_labels obtained from the sublist uniq_modify where if the label of to_modify is in uniq_labels then take a random number between 1 to k except that labels of uniq_labels. In other case take any label from the sublist of the sublist uniq_labels.

  3. 5.The modify sublist to_modify is going to be append in the list of samples uniq_modify in each sublist

  4. 6.We modify the original created list df_splited modifying the unique sublist elements with uniq_modify sublist and modify the duplicated sublist deleting the first row of each sublist since was used on 3.

  5. 7.Create the original dataset with the modify labels.

  6. 8.If the duplicated sublist still have duplicate elements the apply recursively the function to change the label of thoss repeated.

Value

desagregated dataset df with optimized K-labels

Note

This function is used to improve the k-means labels based on the id variable.

Author(s)

Eduardo Trujillo


1Edtrujillo1/udeploy documentation built on July 13, 2021, 9:12 p.m.