BaselearnerPSpline: Non-parametric B or P-spline base learner
In schalkdaniel/compboost: Efficient Component-Wise Boosting Implementation

BaselearnerPSpline

R Documentation

Non-parametric B or P-spline base learner

Description

BaselearnerPSpline creates a spline base learner object. The object calculates the B-spline basis functions and in the case of P-splines also the penalty. Instead of defining the penalty term directly, one should consider to restrict the flexibility by setting the degrees of freedom.

Arguments

`data_source`	(InMemoryData) Data object which contains the raw data (see `?InMemoryData`).
`blearner_type`	(`character(1)`) Type of the base learner (if not specified, `blearner_type = "spline"` is used). The unique id of the base learner is defined by appending `blearner_type` to the feature name: `paste0(data_source$getIdentifier(), "_", blearner_type)`.
`degree`	(`integer(1)`) Degree of the piecewise polynomial (default `degree = 3` for cubic splines).
`n_knots`	(`integer(1)`) Number of inner knots (default `n_knots = 20`). The inner knots are expanded by `degree - 1` additional knots at each side to prevent unstable behavior on the edges.
`penalty`	(`numeric(1)`) Penalty term for P-splines (default `penalty = 2`). Set to zero for B-splines.
`differences`	(`integer(1)`) The number of differences to are penalized. A higher value leads to smoother curves.
`df`	(`numeric(1)`) Degrees of freedom of the base learner(s).
`bin_root`	(`integer(1)`) The binning root to reduce the data to `n^{1/\text{binroot}}` data points (default `bin_root = 1`, which means no binning is applied). A value of `bin_root = 2` is suggested for the best approximation error (cf. Wood et al. (2017) Generalized additive models for gigadata: modeling the UK black smoke network daily data).

Format

S4 object.

Usage

BaselearnerPSpline$new(data_source, list(degree, n_knots, penalty, differences, df, bin_root))
BaselearnerPSpline$new(data_source, blearner_type, list(degree, n_knots, penalty, differences, df, bin_root))

Fields

This class doesn't contain public fields.

Methods

⁠$summarizeFactory()⁠: ⁠() -> ()⁠
⁠$transfromData(newdata)⁠: list(InMemoryData) -> matrix()
⁠$getMeta()⁠: ⁠() -> list()⁠

Inherited methods from Baselearner

⁠$getData()⁠: ⁠() -> matrix()⁠
⁠$getDF()⁠: ⁠() -> integer()⁠
⁠$getPenalty()⁠: ⁠() -> numeric()⁠
⁠$getPenaltyMat()⁠: ⁠() -> matrix()⁠
⁠$getFeatureName()⁠: ⁠() -> character()⁠
⁠$getModelName()⁠: ⁠() -> character()⁠
⁠$getBaselearnerId()⁠: ⁠() -> character()⁠

Details

The data matrix is instantiated as transposed sparse matrix due to performance reasons. The member function ⁠$getData()⁠ accounts for that while ⁠$transformData()⁠ returns the raw data matrix as p x n matrix.

Examples

# Sample data:
x = runif(100, 0, 10)
y = sin(x) + rnorm(100, 0, 0.2)
dat = data.frame(x, y)

# S4 wrapper

# Create new data object, a matrix is required as input:
data_mat = cbind(x)
data_source = InMemoryData$new(data_mat, "my_data_name")

# Create new linear base learner factory:
bl_sp_df2 = BaselearnerPSpline$new(data_source,
  list(n_knots = 10, df = 2, bin_root = 2))
bl_sp_df5 = BaselearnerPSpline$new(data_source,
  list(n_knots = 15, df = 5))

# Get the transformed data:
dim(bl_sp_df2$getData())
dim(bl_sp_df5$getData())

# Summarize factory:
bl_sp_df2$summarizeFactory()

# Get full meta data such as penalty term or matrix as well as knots:
str(bl_sp_df2$getMeta())
bl_sp_df2$getPenalty()
bl_sp_df5$getPenalty() # The penalty here is smaller due to more flexibility

# Transform "new data":
newdata = list(InMemoryData$new(cbind(rnorm(5)), "my_data_name"))
bl_sp_df2$transformData(newdata)
bl_sp_df5$transformData(newdata)

# R6 wrapper

cboost_df2 = Compboost$new(dat, "y")
cboost_df2$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 10, df = 2, bin_root = 2)
cboost_df2$train(200, 0)

cboost_df5 = Compboost$new(dat, "y")
cboost_df5$addBaselearner("x", "sp", BaselearnerPSpline,
  n_knots = 15, df = 5)
cboost_df5$train(200, 0)

# Access base learner directly from the API (n = sqrt(100) = 10 with binning):
str(cboost_df2$baselearner_list$x_sp$factory$getData())
str(cboost_df5$baselearner_list$x_sp$factory$getData())

gg_df2 = plotPEUni(cboost_df2, "x")
gg_df5 = plotPEUni(cboost_df5, "x")

library(ggplot2)
library(patchwork)

(gg_df2 | gg_df5) &
  geom_point(data = dat, aes(x = x, y = y - c(cboost_df2$offset)), alpha = 0.2)

schalkdaniel/compboost documentation built on April 15, 2023, 9:03 p.m.