Description Usage Arguments Details Value Note Author(s)
Improve the dataset labels based on a balanced grouped sum of a particular variable.
1 2 3 4 5 6 7 | improve_kmeans_labels_variable(
df,
id,
label,
var_model,
split_type = c("mean_split", "range_split")
)
|
df |
dataset to change labels. |
id |
dataset id variable |
label |
k-means label variable |
var_model |
reference variable to balance |
split_type |
type of label modification. |
Get the grouped sum of var_model
by the created k-mean
label
variable.
Calculate the population mean and standard deviation from 1. as
parameters to modify the label
create the value_check as the constant values as comparation reference
where the user will select based on the parameter split_type
With help of the function split_lower_upper_df we obtain the labels under and upper the constant values on 3.
Apply the subfunction modify_labels_variable_df
This function modify the label
variable of the df
based
on split_type
:
If split_type == "mean_split"
, change the labels where are
upper the the mean of 1. to the lower ones, to balance
the grouped sum var_model
by the label
If split_type == "range_split"
, change the labels out of the
range population mean +- 1 standard deviation of 1.. Channging
equally to the previous option.
This function is used to improve the k-means labels based on a particular variable.
Eduardo Trujillo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.