knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(multivarious)
Multivariate data analysis often involves reducing dimensionality or transforming data using techniques like Principal Component Analysis (PCA), Partial Least Squares (PLS), Contrastive PCA (cPCA), Nyström approximation for Kernel PCA, or representing data in a specific basis (e.g., Fourier, splines). While each method has unique mathematical underpinnings, they share common operational needs:
Handling these tasks consistently across different algorithms can lead to repetitive code and complex workflows. The multivarious package aims to simplify this by providing a unified interface centered around the concept of a bi_projector.
bi_projector: A Two-Way MapThe bi_projector class is the cornerstone of multivarious. It represents a linear transformation (or an approximation thereof) that provides a two-way mapping:
Think of it as encapsulating the core results of a dimensionality reduction technique (like the U, S, V components of an SVD, or the scores and loadings of PCA/PLS) along with any necessary pre-processing information.
Crucially, many functions within multivarious (e.g., pca(), pls(), cPCAplus(), nystrom_approx(), regress()) return objects that inherit from bi_projector.
bi_projectorBecause different methods return a bi_projector, you can perform common tasks using a consistent set of verbs:
scores(model): Get the scores (latent space representation) of the training data.coef(model) or loadings(model): Get the loadings or coefficients mapping variables to components.project(model, newdata): Project new samples (rows of newdata) into the latent space defined by the model.reconstruct(model, ...): Reconstruct an approximation of the original data from the latent space (either from training scores or provided new scores/coefficients).truncate(model, ncomp): Reduce the number of components kept in the model.summary(model): Get a concise summary of the model dimensions.This consistent API simplifies writing generic analysis code and makes it easier to swap between different dimensionality reduction methods.
Let's demonstrate a typical workflow using PCA on the classic iris dataset.
# Load iris dataset and select numeric columns data(iris) X <- as.matrix(iris[, 1:4]) # 1. Define a pre-processor (center the data) preproc <- center() # 2. Fit PCA using svd_wrapper, keeping 3 components # The pre-processor is applied internally. fit <- pca(X, ncomp = 3, preproc = preproc) # The result 'fit' is a bi_projector print(fit) # 3. Access results iris_scores <- scores(fit) # Scores of the centered training data (150 x 3) iris_loadings <- loadings(fit) # Loadings (4 x 3) cat("\nDimensions of Scores:", dim(iris_scores), "\n") cat("Dimensions of Loadings:", dim(iris_loadings), "\n") # 4. Project new data # Create some new iris-like samples (5 samples, 4 variables) set.seed(123) new_iris_data <- matrix(rnorm(5 * 4, mean = colMeans(X), sd = apply(X, 2, sd)), nrow = 5, byrow = TRUE) # Project the new data into the PCA space defined by 'fit' # Pre-processing (centering using training data means) is applied automatically. projected_new_scores <- project(fit, new_iris_data) cat("\nDimensions of Projected New Data Scores:", dim(projected_new_scores), "\n") print(head(projected_new_scores)) # 5. Reconstruct approximated original data from scores # Reconstruct the first few original samples reconstructed_X_approx <- reconstruct(fit, comp=1:3) # uses scores(fit) by default cat("\nReconstructed Approximation of Original Data (first 5 rows):\n") print(head(reconstructed_X_approx)) print(head(X)) # Original data for comparison
This example shows how fitting (pca), accessing results (scores, loadings), and applying the model to new data (project) follow a consistent pattern, regardless of whether the underlying method was PCA, PLS, or another technique returning a bi_projector.
multivarious EcosystemThe unified bi_projector interface enables several powerful features within the package:
vignette("PreProcessing")).bi_projector steps together (e.g., pre-processing → PCA → rotation) into a single composite projector (see vignette("Composing_Projectors")).bi_projector structure (see vignette("CrossValidation")).project_vars)While project() operates on new samples (rows), the bi_projector also supports projecting new variables (columns) into the component space defined by the model's scores (U vectors in SVD terms). This is done using project_vars().
# Using the 'fit' object from the PCA example above # Create a new variable (column) with the same number of samples as original data set.seed(456) new_variable <- rnorm(nrow(X)) # Project this new variable into the component space defined by the PCA scores (fit$s) # Result shows how the new variable relates to the principal components. projected_variable_loadings <- project_vars(fit, new_variable) cat("\nProjection of new variable onto components:", projected_variable_loadings, "\n")
The multivarious package provides a consistent and extensible framework for common dimensionality reduction and related linear transformation tasks. By leveraging the bi_projector class, it offers a unified API for fitting models, projecting new data, reconstruction, and accessing key model components. This simplifies workflows, promotes code reuse, and facilitates integration with pre-processing, model composition, and cross-validation tools within the package ecosystem.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.