knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) library(dplyr) library(ggplot2) # Assuming necessary multiblock functions are loaded
Assume you trained a dimensionality-reduction model (PCA, PLS ...) on p variables but, at prediction time,
You still want the latent scores in the same component space, so downstream models, dashboards, alarms, ... keep running.
That's exactly what
partial_project(model, new_data_subset, colind = which.columns)
does:
new_data_subset
(n × q) ─► project into latent space (n × k)
with q ≤ p. If the loading vectors are orthonormal this is a simple dot product; otherwise a ridge-regularised least-squares solve is used.
set.seed(1) n <- 100 p <- 8 X <- matrix(rnorm(n * p), n, p) # Fit a centred 3-component PCA (via SVD) # Assuming bi_projector, prep, center are available Xc <- scale(X, center = TRUE, scale = FALSE) svd_res <- svd(Xc, nu = 0, nv = 3) pca <- bi_projector( # <- multiblock helper v = svd_res$v, s = Xc %*% svd_res$v, sdev = svd_res$d[1:3] / sqrt(n-1), # Correct scaling for sdev preproc = prep(center()) )
scores_full <- project(pca, X) # n × 3 head(round(scores_full, 2))
Suppose columns 7 and 8 are unavailable for a new batch.
X_miss <- X[, 1:6] # keep only first 6 columns col_subset <- 1:6 # their positions in the **original** X scores_part <- partial_project(pca, X_miss, colind = col_subset) # How close are the results? plot_df <- tibble( full = scores_full[,1], part = scores_part[,1] ) ggplot(plot_df, aes(full, part)) + geom_point() + geom_abline(col = "red") + coord_equal() + labs(title = "Component 1: full vs. partial projection") + theme_minimal()
Even with two variables missing, the ridge LS step recovers latent scores that lie almost on the 1:1 line.
If you expect many rows with the same subset of features, create a specialised projector once and reuse it:
# Assuming partial_projector is available pca_1to6 <- partial_projector(pca, 1:6) # keeps a reference + cache # project 1000 new observations that only have the first 6 vars new_batch <- matrix(rnorm(1000 * 6), 1000, 6) scores_fast <- project(pca_1to6, new_batch) dim(scores_fast) # 1000 × 3
Internally, partial_projector()
stores the mapping
v[1:6, ]
and a pre-computed inverse, so calls to project()
are
as cheap as a matrix multiplication.
For multiblock fits (created with multiblock_projector()
),
you can instead write
# Assuming mb is a multiblock_projector and data_blockB is the data for block B # project_block(mb, data_blockB, block = "B") # Or block = 2
which is just a wrapper around partial_project()
using the block's
column indices.
Partial projection is handy even when all measurements exist:
project_block()
.Assume columns 1–5 (instead of 50 for brevity) of X
form our ROI.
roi_cols <- 1:5 # pretend these are the ROI voxels X_roi <- X[, roi_cols] # same matrix from Section 2 roi_scores <- partial_project(pca, X_roi, colind = roi_cols) # Compare component 1 from full vs ROI df_roi <- tibble( full = scores_full[,1], roi = roi_scores[,1] ) ggplot(df_roi, aes(full, roi)) + geom_point(alpha = .6) + geom_abline(col = "red") + coord_equal() + labs(title = "Component 1 scores: full data vs ROI") + theme_minimal()
Interpretation: If the two sets of scores align tightly, the ROI variables are driving this component. A strong deviation would reveal that other variables dominate the global pattern.
# imagine `mb_pca` is a multiblock_biprojector with 2 blocks: # Block 1 = questionnaire (Q1–Q30) # Block 2 = reaction-time curves (RT1–RT120) # Assume data_subject7_block2 contains only the reaction time data for subject 7 # subject_7_scores <- project_block(mb_pca, # new_data = data_subject7_block2, # block = 2) # only RT variables # cat("Subject 7, component scores derived *solely* from reaction times:\n") # print(round(subject_7_scores, 2))
You can now overlay these scores on a map built from all subjects' global scores to see whether subject 7's behavioural profile is consistent with their psychometrics, or an outlier when viewed from this angle alone.
partial_project()
| Scenario | What you pass | Typical call |
|-------------------------------------|-----------------------------------|---------------------------------------------|
| Sensor outage / missing features | matrix with observed cols only | partial_project(mod, X_obs, colind = idx)
|
| Region of interest (ROI) | ROI columns of the data | partial_project(mod, X[, ROI], ROI)
|
| Block-specific latent scores | full block matrix | project_block(mb, blkData, block = b)
|
| "What-if": vary a single variable set | varied cols + zeros elsewhere | partial_project()
with matching colind
|
The component space stays identical throughout, so downstream analytics, classifiers, or control charts continue to work with no re-training.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.