Description Usage Arguments Details Value Author(s) See Also
Create the k-means label, having in count a balanced datased based on a unique id and a particular variable.
1 2 3 4 5 6 7 8 | balanced_desagregated_kmean(
df,
var_model,
ref_desgte,
id,
k,
pre_cleaning = FALSE
)
|
df |
dataset to create the labels |
var_model |
reference variable of balance |
ref_desgte |
reference desagregation variable |
id |
id variable reference of balance |
k |
number of desire clusters |
pre_cleaning |
clean or not previous to apply the k-means model |
If the user select True on the pre_cleaning
then the function will
clean the dataset as preparation to obtain the k-means labels as a new
dataset variable.
Standarize the variable var_model
according to the levels of
ref_desgte
Desagregate the dataset for all the observations based on the variable
ref_desgte
with frequency greater to 1, taking in mind that the
id
variable needs to be unique.
Creating the LABEL variable as the optimal k-mean labels for the dataset.
Optimize generated k-labels for desagregated dataset based on the
id
variable.
Improve the dataset labels based on a balanced grouped sum of
var_model
changing the labels that are upper the mean of the previous
to the lower labels.
Improve the dataset labels based on a balanced grouped sum of
var_model
changing the labels that are out of the range population
mean +- 1 standard deviation in the same way as 6.
Again optmize generated k-labels for desagregated datased based on the
id
variable to ensure that each id
has different label.
dataset df
with a new created variable "LABEL" with levels up
to k
that balance the whole dataset where:
If pre_cleaning == TRUE
, is going to clean the dataset before
applying the k-means model
If pre_cleaning == FALSE
, is going to apply the k-means model
to the provided dataset
Eduardo Trujillo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.