modify_labels_variable_df: modify_labels_variable_df

Description Usage Arguments Details Value Note Author(s) See Also

Description

Improve the dataset labels based on a balanced grouped sum of a particular variable.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
modify_labels_variable_df(
  df,
  id,
  label,
  var_model,
  check,
  check_mean,
  check_labels,
  sum_var_model
)

Arguments

df

dataset to change labels.

id

dataset id variable

label

k-means label variable

var_model

reference variable to balance

check

grouped sum from improve_kmeans_labels_variable function

check_mean

population mean from improve_kmeans_labels_variable function

check_labels

labels under and upper the constant values from improve_kmeans_labels_variable function

sum_var_model

variable from check that represent the grouped sum from improve_kmeans_labels_variable function

Details

  1. Based on the separeted labels (4. of improve_kmeans_labels_variable) check_labels for the uppers get the difference between the observations of sum_var_model in each level on check and the mean (total sum of each level where is necessary to quit from upper and pass to under).

  2. From the previous If we have only one total sum to quit from upper then return that list in other case get the maximums according as the number of unders as well as minimums. For the previous get a maximum and minimum for each under (uppers that are going to pass to the uders)

  3. From the previous set the name of each sublist as the names of the unders from the list check_labels to know which is going to pass from upper to which from unders.

  4. Modify the dataset, based on the previous each under (max and min) change the label of the uppers to unders to balance the dataset.

Value

The provided dataset df modifying the label variable.

Note

This function is a subfunction of improve_kmeans_labels_variable

Author(s)

Eduardo Trujillo

See Also

nearest values


1Edtrujillo1/udeploy documentation built on July 13, 2021, 9:12 p.m.