Description Usage Arguments Details Value Note Author(s)
Optimize generated K-labels for desagregated dataset
1 | improve_kmeans_labels(df, id, label, k)
|
df |
dataset to change labels. |
id |
dataset id variable reference of balance |
label |
k-means label variable |
k |
number of desire clusters |
split the dataset by id
testing unique elements on label
.
Spliting on unique and duplicated sublist of the list improve_kmeans_labels
called df_splited
.
For the duplicated sublist, we split it by the label
testing
unique element on label
in unique elements and duplicated elements.
For the duplicated elements, then those are going to be the duplicated sublist of 1.
For the unique elements, then those are going to be appended to the unique sublist 1. on each same specific sublist of the unique sublist 1.
Now we have a correct sublist of unique and duplicated elements.
3.From the duplicated sublist we take the first row of each sublist called to_modify and from the unique sublist we take a random sample of the same length from the duplicated one called uniq_modify. From that sublist we create a sublist of the k-mean labels called uniq_labels.
4.We modify to_modify based on the list of labels uniq_labels obtained from the sublist uniq_modify where if the label of to_modify is in uniq_labels then take a random number between 1 to k except that labels of uniq_labels. In other case take any label from the sublist of the sublist uniq_labels.
5.The modify sublist to_modify is going to be append in the list of samples uniq_modify in each sublist
6.We modify the original created list df_splited
modifying the
unique sublist elements with uniq_modify sublist and modify the duplicated
sublist deleting the first row of each sublist since was used on 3.
7.Create the original dataset with the modify labels.
8.If the duplicated sublist still have duplicate elements the apply recursively the function to change the label of thoss repeated.
desagregated dataset df
with optimized K-labels
This function is used to improve the k-means labels based on the id variable.
Eduardo Trujillo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.