knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width=6, fig.height=4) # Assuming necessary multivarious functions are loaded # e.g., via devtools::load_all() or library(multivarious) library(multivarious) # Implicit dependencies like glmnet or pls might be needed depending on method used.
When you already have a known set of basis functions—such as Fourier components, wavelets, splines, or principal components from a reference dataset—the goal isn't to discover a latent space, but rather to express new data in terms of that existing basis. The regress()
function facilitates this by wrapping several multi-output linear models within the bi_projector
API.
This integration means you automatically inherit standard bi_projector
methods for tasks like:
project()
: Map new data from the original space into the basis coefficients.inverse_projection()
: Map basis coefficients back to the original data space.reconstruct()
/ reconstruct_new()
: Reconstruct data using all or a subset of basis components.coef()
: Retrieve the basis coefficients.truncate()
: Keep only a subset of basis components.This vignette demonstrates a typical workflow using an orthonormal Fourier basis, but the regress()
function works equally well with arbitrary, potentially non-orthogonal, basis dictionaries.
First, let's define our basis. We'll use sines and cosines.
set.seed(42) n <- 128 # Number of observations (e.g., signals) p <- 32 # Original variables per observation (e.g., time points) k <- 20 # Number of basis functions (<= p often, <= n for lm) ## Create toy data: smooth signals + noise t <- seq(0, 1, length.out = p) Y <- replicate(n, 3*sin(2*pi*3*t) + 2*cos(2*pi*5*t) ) + matrix(rnorm(n*p, sd = 0.3), p, n) # Note: Y is p x n here ## Orthonormal Fourier basis (columns = basis functions) # We create k/2 sine and k/2 cosine terms, plus an intercept freqs <- 1:(k %/% 2) # Integer division for number of frequencies B <- cbind(rep(1, p), # Intercept column do.call(cbind, lapply(freqs, function(f) sin(2*pi*f*t))), do.call(cbind, lapply(freqs, function(f) cos(2*pi*f*t)))) colnames(B) <- c("Intercept", paste0("sin", freqs), paste0("cos", freqs)) # Make columns orthonormal (length 1, orthogonal to each other) B <- scale(B, center = FALSE, scale = sqrt(colSums(B^2))) cat(paste("Dimensions: Y is", nrow(Y), "x", ncol(Y), ", Basis B is", nrow(B), "x", ncol(B), "\n")) # We want coefficients C (k x n) such that Y ≈ B %*% C.
Now, we use regress()
to find the coefficients (C) that best represent each signal in (Y) using the basis (B).
library(multivarious) # Fit using standard linear models (lm) # Note: Y is p x n, B is p x k. regress expects data where rows are features. # The internal model (e.g., lm.fit) might expect samples in rows, # but regress() handles the orientation based on the X and Y arguments. fit <- regress(X = B, # Predictors = basis functions (p x k) Y = Y, # Response = signals (p x n) method = "lm", intercept = FALSE) # Basis B already includes an intercept column # The result is a bi_projector object print(fit) ## Conceptual mapping to bi_projector slots: # fit$v : Coefficients (k x n) - Basis coefficients for each observation/signal. # fit$s : Design Matrix (p x k) - The (potentially centered/scaled) basis matrix B. # Stored for reconstruction.
The bi_projector
structure provides a consistent way to access the core components: the basis (fit$s
) and the coefficients (fit$v
).
With the fitted bi_projector
, projecting and reconstructing is straightforward.
## Get the coefficients for the first 3 signals coef_matrix_first3 <- fit$v[, 1:3] # k x 3 matrix cat("First few coefficients for signal 1:\n") print(head(coef_matrix_first3[, 1])) ## Reconstruct the original fitted data (Y = B %*% C) # Should be near-perfect for lm if k >= rank(Y) Y_hat <- reconstruct(fit) # Returns p x n matrix max_reconstruction_error <- max(abs(Y_hat - Y)) cat("\nMaximum reconstruction error for fitted data:", format(max_reconstruction_error, digits=3), "\n") stopifnot(max_reconstruction_error < 1e-10) ## Project a *new* signal onto the basis # Create a new signal (using the same underlying pattern + noise) Y_new_signal <- 3*sin(2*pi*3*t) + 2*cos(2*pi*5*t) + rnorm(p, sd=0.3) Y_new_matrix <- matrix(Y_new_signal, ncol = 1) # Needs to be p x 1 # project() finds the coefficients for Y_new using the basis B # Equivalent to C_new = B^+ %*% Y_new, where B^+ is pseudo-inverse coef_new <- project(fit, Y_new_matrix) # Returns k x 1 matrix cat("\nCoefficients for new signal:\n") print(head(coef_new[, 1])) # Reconstruct the new signal from its coefficients # Equivalent to Y_new_recon = B %*% C_new Y_new_recon <- reconstruct(fit, coefs = coef_new) # Returns p x 1 matrix # Note: Because our basis B was constructed to be orthonormal, # projection simplifies to a dot product: coef_new = t(B) %*% Y_new_matrix # But the project/reconstruct API handles non-orthogonal bases correctly.
If the basis is ill-conditioned or you need feature selection/shrinkage, simply change the method
argument. regress()
wraps common regularized models.
# Ridge regression (requires glmnet) fit_ridge <- regress(X = B, Y = Y, method = "mridge", lambda = 0.01, intercept = FALSE) # Elastic Net (requires glmnet) fit_enet <- regress(X = B, Y = Y, method = "enet", alpha = 0.5, lambda = 0.02, intercept = FALSE) # Partial Least Squares (requires pls package) - useful if k > p or multicollinearity fit_pls <- regress(X = B, Y = Y, method = "pls", ncomp = 15, intercept = FALSE) # All these return bi_projector objects, so downstream code using # project(), reconstruct(), coef() etc. remains the same.
The bi_projector
interface allows for flexible manipulation:
# Truncate: Keep only the first 5 basis functions (Intercept + 2 sine + 2 cosine) fit5 <- truncate(fit, ncomp = 5) cat("Dimensions after truncating to 5 components:", "Basis (s):", paste(dim(fit5$s), collapse="x"), ", Coefs (v):", paste(dim(fit5$v), collapse="x"), "\n") Y_hat5 <- reconstruct(fit5) # Reconstruct using only first 5 basis functions # Partial inverse projection: Map only a subset of coefficients back # e.g., reconstruct using only components 2 through 6 (skip intercept) # Note: partial_inverse_projection is not a standard bi_projector method, # this might require manual slicing of the basis matrix B (fit$s) or coefs (fit$v). # Manual reconstruction example for components 2:6 coef_subset <- fit$v[2:6, , drop=FALSE] # k_sub x n basis_subset <- fit$s[, 2:6, drop=FALSE] # p x k_sub Y_lowHat <- basis_subset %*% coef_subset # p x n reconstruction # Variable usage helpers (Conceptual - actual functions might differ) # `variables_used(fit)` could show which basis functions have non-zero coefficients (esp. for 'enet'). # `vars_for_component(fit, k)` isn't directly applicable here as components are the basis functions themselves.
The core idea is to represent the (p \times n) data matrix (Y) as a product of the (p \times k) basis matrix (stored in s
) and the (k \times n) coefficient matrix (stored in v
):
[ \underbrace{Y}{p\times n} \approx \underbrace{s}{p\times k} \underbrace{v}_{k\times n} ]
regress()
estimates (v) (coefficients) based on the chosen method:
| method
| Solver Used (Conceptual) | Regularisation | Key Reference |
|----------|--------------------------|----------------------|----------------------|
| "lm" | QR decomposition (lm.fit
)| None | Classical OLS |
| "mridge" | glmnet
(alpha=0) | Ridge ((\lambda ||\beta||_2^2)) | Hoerl & Kennard 1970 |
| "enet" | glmnet
| Elastic Net ((\alpha)-mix) | Zou & Hastie 2005 |
| "pls" | pls::plsr
| Latent PLS factors | Wold 1984 |
The resulting object stores:
v
: The estimated coefficient matrix ((k \times n)).
s
: The basis/design matrix (B) ((p \times k)), possibly centered or scaled by the underlying solver.
* sdev
, center
: Potentially stores scaling/centering info related to (s).
All other bi_projector
methods (project
, reconstruct
, inverse_projection
) are derived from these core matrices.
This section contains internal consistency checks, primarily for package development and CI testing.
The code chunk below only runs if the environment variable _MULTIVARIOUS_DEV_COVERAGE
is set.
# This chunk only runs if _MULTIVARIOUS_DEV_COVERAGE is non-empty message("Running internal consistency checks for regress()...") tryCatch({ stopifnot( # Check reconstruction fidelity for lm max(abs(reconstruct(fit) - Y)) < 1e-10, # Check dimensions of inverse projection matrix (n x p) # inverse_projection maps coefficients (k x n) back to data (p x n) # The matrix itself maps k -> p implicitly. Let's check coef matrix dims. nrow(fit$v) == ncol(B), # k rows ncol(fit$v) == ncol(Y) # n columns # Add checks for other methods if evaluated ) message("Regress internal checks passed.") }, error = function(e) { warning("Regress internal checks failed: ", e$message) })
regress()
provides a convenient way to fit multiple linear models simultaneously when expressing data (Y) in a known basis (B).bi_projector
, giving immediate access to projection, reconstruction, truncation, and coefficient extraction.lm
), Ridge (mridge
), Elastic Net (enet
), and PLS (pls
) out-of-the-box.%>>%
) or cross-validation helpers.Happy re-representing!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.