BaselearnerPSpline: Non-parametric B or P-spline base learner

BaselearnerPSplineR Documentation

Non-parametric B or P-spline base learner

Description

BaselearnerPSpline creates a spline base learner object. The object calculates the B-spline basis functions and in the case of P-splines also the penalty. Instead of defining the penalty term directly, one should consider to restrict the flexibility by setting the degrees of freedom.

Arguments

data_source

(InMemoryData)
Data object which contains the raw data (see ?InMemoryData).

blearner_type

(character(1))
Type of the base learner (if not specified, blearner_type = "spline" is used). The unique id of the base learner is defined by appending blearner_type to the feature name: paste0(data_source$getIdentifier(), "_", blearner_type).

degree

(integer(1))
Degree of the piecewise polynomial (default degree = 3 for cubic splines).

n_knots

(integer(1))
Number of inner knots (default n_knots = 20). The inner knots are expanded by degree - 1 additional knots at each side to prevent unstable behavior on the edges.

penalty

(numeric(1))
Penalty term for P-splines (default penalty = 2). Set to zero for B-splines.

differences

(integer(1))
The number of differences to are penalized. A higher value leads to smoother curves.

df

(numeric(1))
Degrees of freedom of the base learner(s).

bin_root

(integer(1))
The binning root to reduce the data to n^{1/\text{binroot}} data points (default bin_root = 1, which means no binning is applied). A value of bin_root = 2 is suggested for the best approximation error (cf. Wood et al. (2017) Generalized additive models for gigadata: modeling the UK black smoke network daily data).

Format

S4 object.

Usage

BaselearnerPSpline$new(data_source, list(degree, n_knots, penalty, differences, df, bin_root))
BaselearnerPSpline$new(data_source, blearner_type, list(degree, n_knots, penalty, differences, df, bin_root))

Fields

This class doesn't contain public fields.

Methods

  • ⁠$summarizeFactory()⁠: ⁠() -> ()⁠

  • ⁠$transfromData(newdata)⁠: list(InMemoryData) -> matrix()

  • ⁠$getMeta()⁠: ⁠() -> list()⁠

Inherited methods from Baselearner

  • ⁠$getData()⁠: ⁠() -> matrix()⁠

  • ⁠$getDF()⁠: ⁠() -> integer()⁠

  • ⁠$getPenalty()⁠: ⁠() -> numeric()⁠

  • ⁠$getPenaltyMat()⁠: ⁠() -> matrix()⁠

  • ⁠$getFeatureName()⁠: ⁠() -> character()⁠

  • ⁠$getModelName()⁠: ⁠() -> character()⁠

  • ⁠$getBaselearnerId()⁠: ⁠() -> character()⁠

Details

The data matrix is instantiated as transposed sparse matrix due to performance reasons. The member function ⁠$getData()⁠ accounts for that while ⁠$transformData()⁠ returns the raw data matrix as p x n matrix.

Examples

# Sample data:
x = runif(100, 0, 10)
y = sin(x) + rnorm(100, 0, 0.2)
dat = data.frame(x, y)

# S4 wrapper

# Create new data object, a matrix is required as input:
data_mat = cbind(x)
data_source = InMemoryData$new(data_mat, "my_data_name")

# Create new linear base learner factory:
bl_sp_df2 = BaselearnerPSpline$new(data_source,
  list(n_knots = 10, df = 2, bin_root = 2))
bl_sp_df5 = BaselearnerPSpline$new(data_source,
  list(n_knots = 15, df = 5))

# Get the transformed data:
dim(bl_sp_df2$getData())
dim(bl_sp_df5$getData())

# Summarize factory:
bl_sp_df2$summarizeFactory()

# Get full meta data such as penalty term or matrix as well as knots:
str(bl_sp_df2$getMeta())
bl_sp_df2$getPenalty()
bl_sp_df5$getPenalty() # The penalty here is smaller due to more flexibility

# Transform "new data":
newdata = list(InMemoryData$new(cbind(rnorm(5)), "my_data_name"))
bl_sp_df2$transformData(newdata)
bl_sp_df5$transformData(newdata)

# R6 wrapper

cboost_df2 = Compboost$new(dat, "y")
cboost_df2$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 10, df = 2, bin_root = 2)
cboost_df2$train(200, 0)

cboost_df5 = Compboost$new(dat, "y")
cboost_df5$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 15, df = 5)
cboost_df5$train(200, 0)

# Access base learner directly from the API (n = sqrt(100) = 10 with binning):
str(cboost_df2$baselearner_list$x_sp$factory$getData())
str(cboost_df5$baselearner_list$x_sp$factory$getData())

gg_df2 = plotPEUni(cboost_df2, "x")
gg_df5 = plotPEUni(cboost_df5, "x")

library(ggplot2)
library(patchwork)

(gg_df2 | gg_df5) &
  geom_point(data = dat, aes(x = x, y = y - c(cboost_df2$offset)), alpha = 0.2)

schalkdaniel/compboost documentation built on April 15, 2023, 9:03 p.m.