process_input: Process Input Arguments for lgspline

View source: R/process_input.R

process_inputR Documentation

Process Input Arguments for lgspline

Description

Parses formula and data arguments, performs factor encoding, resolves variable roles (spline, linear-with-interactions, linear-without-interactions), constructs exclusion patterns, and validates inputs. Called internally by lgspline.

Users may call this function directly to inspect how their formula and data are interpreted before fitting.

Usage

process_input(
  predictors = NULL,
  y = NULL,
  formula = NULL,
  response = NULL,
  data = NULL,
  weights = NULL,
  observation_weights = NULL,
  family = gaussian(),
  K = NULL,
  custom_knots = NULL,
  auto_encode_factors = TRUE,
  include_2way_interactions = TRUE,
  include_3way_interactions = TRUE,
  just_linear_with_interactions = NULL,
  just_linear_without_interactions = NULL,
  exclude_interactions_for = NULL,
  exclude_these_expansions = NULL,
  offset = c(),
  no_intercept = FALSE,
  do_not_cluster_on_these = c(),
  include_quartic_terms = NULL,
  cluster_args = c(custom_centers = NA, nstart = 10),
  include_warnings = TRUE,
  dummy_fit = FALSE,
  include_constrain_second_deriv = TRUE,
  standardize_response = TRUE,
  ...
)

Arguments

predictors

Default: NULL. Formula or numeric matrix/data frame of predictor variables.

y

Default: NULL. Numeric response vector.

formula

Default: NULL. Optional formula; alias for predictors when a formula object.

response

Default: NULL. Alternative name for y.

data

Default: NULL. Data frame for formula interface.

weights

Default: NULL. Alias for observation_weights.

observation_weights

Default: NULL. Numeric observation weight vector.

family

Default: gaussian(). GLM family object.

K

Default: NULL. Number of interior knots.

custom_knots

Default: NULL. Custom knot matrix.

auto_encode_factors

Default: TRUE. Logical; auto one-hot encode factor and character columns when using the formula interface.

include_2way_interactions

Default: TRUE. Logical.

include_3way_interactions

Default: TRUE. Logical.

just_linear_with_interactions

Default: NULL. Integer vector or character vector of column names.

just_linear_without_interactions

Default: NULL. Integer vector or character vector of column names.

exclude_interactions_for

Default: NULL. Integer vector or character vector of column names.

exclude_these_expansions

Default: NULL. Character vector of expansion names to exclude.

offset

Default: c(). Vector of column indices or names to include as offsets.

no_intercept

Default: FALSE. Logical; remove intercept.

do_not_cluster_on_these

Default: c(). Vector of column indices or names to exclude from clustering.

include_quartic_terms

Default: NULL. Logical or NULL.

cluster_args

Default: c(custom_centers = NA, nstart = 10). Named vector of clustering arguments.

include_warnings

Default: TRUE. Logical.

dummy_fit

Default: FALSE. Logical; run the full preprocessing path but stop short of fitting nonzero coefficients.

include_constrain_second_deriv

Default: TRUE. Logical.

standardize_response

Default: TRUE. Logical.

...

Additional arguments (checked for depreciated names).

Value

A named list containing:

predictors

Numeric matrix of predictor variables with column names stripped for positional indexing.

y

Numeric response vector.

og_cols

Character vector of original predictor column names, or NULL if none were available.

replace_colnames

Logical; TRUE if og_cols is available and column renaming should be applied post-fit.

just_linear_with_interactions

Integer vector of column indices, or NULL.

just_linear_without_interactions

Integer vector of column indices, or NULL.

exclude_interactions_for

Integer vector of column indices, or NULL.

exclude_these_expansions

Character vector of positional-notation expansion names, or NULL.

offset

Integer vector of column indices, or c().

no_intercept

Logical.

do_not_cluster_on_these

Numeric vector of column indices, or c().

observation_weights

Numeric vector or NULL.

K

Integer or NULL, possibly updated by cluster_args or all-linear detection.

include_3way_interactions

Logical, possibly updated by formula parsing.

include_quartic_terms

Logical or NULL, possibly updated by number of predictors.

data

Data frame, possibly with factor columns one-hot encoded.

include_constrain_second_deriv

Logical, possibly set FALSE when no numeric predictors remain.

factor_groups

Named list mapping original factor column names to integer vectors of their one-hot indicator column positions within the predictor matrix. Used by lgspline.fit to impose sum-to-zero constraints on encoded factor levels. NULL when no factors were encoded, or when the formula interface was not used.

See Also

lgspline for the main fitting interface.

Examples

## Not run: 
data("Theoph")
df <- Theoph[, c("Time", "Dose", "conc", "Subject")]
processed <- process_input(
  predictors = conc ~ spl(Time) + Time*Dose,
  data = df,
  auto_encode_factors = TRUE,
  include_warnings = TRUE
)
str(processed$predictors)
processed$og_cols
processed$just_linear_without_interactions
processed$factor_groups

## End(Not run)


lgspline documentation built on May 8, 2026, 5:07 p.m.