finalize: Functions to finalize data-specific parameter ranges

View source: R/finalize.R

finalizeR Documentation

Functions to finalize data-specific parameter ranges

Description

These functions take a parameter object and modify the unknown parts of ranges based on a data set and simple heuristics.

Usage

finalize(object, ...)

## S3 method for class 'list'
finalize(object, x, force = TRUE, ...)

## S3 method for class 'param'
finalize(object, x, force = TRUE, ...)

## S3 method for class 'parameters'
finalize(object, x, force = TRUE, ...)

## S3 method for class 'logical'
finalize(object, x, force = TRUE, ...)

## Default S3 method:
finalize(object, x, force = TRUE, ...)

get_p(object, x, log_vals = FALSE, ...)

get_log_p(object, x, ...)

get_n_frac(object, x, log_vals = FALSE, frac = 1/3, ...)

get_n_frac_range(object, x, log_vals = FALSE, frac = c(1/10, 5/10), ...)

get_n(object, x, log_vals = FALSE, ...)

get_rbf_range(object, x, seed = sample.int(10^5, 1), ...)

get_batch_sizes(object, x, frac = c(1/10, 1/3), ...)

Arguments

object

A param object or a list of param objects.

...

Other arguments to pass to the underlying parameter finalizer functions. For example, for get_rbf_range(), the dots are passed along to kernlab::sigest().

x

The predictor data. In some cases (see below) this should only include numeric data.

force

A single logical that indicates that even if the parameter object is complete, should it update the ranges anyway?

log_vals

A logical: should the ranges be set on the log10 scale?

frac

A double for the fraction of the data to be used for the upper bound. For get_n_frac_range() and get_batch_sizes(), a vector of two fractional values are required.

seed

An integer to control the randomness of the calculations.

Details

finalize() runs the embedded finalizer function contained in the param object (object$finalize) and returns the updated version. The finalization function is one of the ⁠get_*()⁠ helpers.

The ⁠get_*()⁠ helper functions are designed to be used with the pipe and update the parameter object in-place.

get_p() and get_log_p() set the upper value of the range to be the number of columns in the data (on the natural and log10 scale, respectively).

get_n() and get_n_frac() set the upper value to be the number of rows in the data or a fraction of the total number of rows.

get_rbf_range() sets both bounds based on the heuristic defined in kernlab::sigest(). It requires that all columns in x be numeric.

Value

An updated param object or a list of updated param objects depending on what is provided in object.

Examples


library(dplyr)
car_pred <- select(mtcars, -mpg)

# Needs an upper bound
mtry()
finalize(mtry(), car_pred)

# Nothing to do here since no unknowns
penalty()
finalize(penalty(), car_pred)

library(kernlab)
library(tibble)
library(purrr)

params <-
  tribble(
    ~parameter, ~object,
    "mtry", mtry(),
    "num_terms", num_terms(),
    "rbf_sigma", rbf_sigma()
  )
params

# Note that `rbf_sigma()` has a default range that does not need to be
# finalized but will be changed if used in the function:
complete_params <-
  params %>%
  mutate(object = map(object, finalize, car_pred))
complete_params

params %>%
  dplyr::filter(parameter == "rbf_sigma") %>%
  pull(object)
complete_params %>%
  dplyr::filter(parameter == "rbf_sigma") %>%
  pull(object)


dials documentation built on Sept. 11, 2024, 8:25 p.m.