partition_and_normalize: Partition and Normalize

View source: R/partition_and_normalize.R

partition_and_normalizeR Documentation

Partition and Normalize

Description

Function that processes the input data splitting it into training and test sets and normalizes the outputs depending on the best instance performance. The user can bypass the partition into training and test set by passing the parameters x.test and y.test.

Usage

partition_and_normalize(
  x,
  y,
  x.test = NULL,
  y.test = NULL,
  family_column = NULL,
  split_by_family = FALSE,
  test_size = 0.3,
  better_smaller = TRUE
)

Arguments

x

dataframe with the instances (rows) and its features (columns). It may also include a column with the family data.

y

dataframe with the instances (rows) and the corresponding output (KPI) for each algorithm (columns).

x.test

dataframe with the test features. It may also include a column with the family data. If NULL the algorithm will split x into training and test sets.

y.test

dataframe with the test outputs. If NULL the algorithm will y into training and test sets.

family_column

column number of x where each instance family is indicated. If given, aditional options for the training and set test splitting and the graphics are enabled.

split_by_family

boolean indicating if we want to split sets keeping family proportions in case x.test and y.test are NULL. This option requires that option family_column is different from NULL.

test_size

float with the segmentation proportion for the test dataframe. It must be a value between 0 and 1. Only needed when x.test and y.test are NULL.

better_smaller

boolean that indicates wether the output (KPI) is better if smaller (TRUE) or larger (FALSE).

Value

A list is returned of class as_data containing:

  • x.train A data frame with the training features.

  • y.train A data frame with the training output.

  • x.test A data frame with the test features.

  • y.test A data frame with the test output.

  • y.train.original A vector with the original training output (without normalizing).

  • y.test.original A vector with the original test output (without normalizing).

  • families.train A data frame with the families of the training data.

  • families.test A data frame with the families of the test data.

Examples

data(branching)
data_obj <- partition_and_normalize(branching$x, branching$y, test_size = 0.3,
family_column = 1, split_by_family = TRUE)


ASML documentation built on April 3, 2025, 8:47 p.m.