improve_kmeans_labels_variable: improve_kmeans_labels_variable
In 1Edtrujillo1/udeploy: Back-end & Front-end functions

Improve the dataset labels based on a balanced grouped sum of a particular variable.

improve_kmeans_labels_variable(
  df,
  id,
  label,
  var_model,
  split_type = c("mean_split", "range_split")
)

`df`	dataset to change labels.
`id`	dataset id variable
`label`	k-means label variable
`var_model`	reference variable to balance
`split_type`	type of label modification.

Get the grouped sum of var_model by the created k-mean label variable.
Calculate the population mean and standard deviation from 1. as parameters to modify the label
create the value_check as the constant values as comparation reference where the user will select based on the parameter split_type
With help of the function split_lower_upper_df we obtain the labels under and upper the constant values on 3.
Apply the subfunction modify_labels_variable_df

This function modify the label variable of the df based on split_type:

If split_type == "mean_split", change the labels where are upper the the mean of 1. to the lower ones, to balance the grouped sum var_model by the label
If split_type == "range_split", change the labels out of the range population mean +- 1 standard deviation of 1.. Channging equally to the previous option.

This function is used to improve the k-means labels based on a particular variable.

Eduardo Trujillo

1Edtrujillo1/udeploy documentation built on July 13, 2021, 9:12 p.m.