Description Usage Arguments Details Value Note Author(s) See Also
Improve the dataset labels based on a balanced grouped sum of a particular variable.
1 2 3 4 5 6 7 8 9 10 | modify_labels_variable_df(
df,
id,
label,
var_model,
check,
check_mean,
check_labels,
sum_var_model
)
|
df |
dataset to change labels. |
id |
dataset id variable |
label |
k-means label variable |
var_model |
reference variable to balance |
check |
grouped sum from improve_kmeans_labels_variable function |
check_mean |
population mean from improve_kmeans_labels_variable function |
check_labels |
labels under and upper the constant values from improve_kmeans_labels_variable function |
sum_var_model |
variable from check that represent the grouped sum from improve_kmeans_labels_variable function |
Based on the separeted labels (4. of improve_kmeans_labels_variable)
check_labels
for the uppers get the difference between the
observations of sum_var_model
in each level on check
and the mean (total sum of each level where is necessary to quit from
upper and pass to under).
From the previous If we have only one total sum to quit from upper then return that list in other case get the maximums according as the number of unders as well as minimums. For the previous get a maximum and minimum for each under (uppers that are going to pass to the uders)
From the previous set the name of each sublist as the names of the unders
from the list check_labels
to know which is going to pass from upper
to which from unders.
Modify the dataset, based on the previous each under (max and min) change the label of the uppers to unders to balance the dataset.
The provided dataset df
modifying the label
variable.
This function is a subfunction of improve_kmeans_labels_variable
Eduardo Trujillo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.