compute_dG_u_dlambda_xy: Derivative of Constrained Penalized Coefficients with Respect...

View source: R/HelperFunctions.R

compute_dG_u_dlambda_xyR Documentation

Derivative of Constrained Penalized Coefficients with Respect to Lambda

Description

Computes d(\mathbf{U}\mathbf{G}\mathbf{X}^{T}\mathbf{y})/d\lambda, the sensitivity of the constrained coefficient estimates \tilde{\boldsymbol{\beta}} = \mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y} to changes in the smoothing parameter \lambda.

Usage

compute_dG_u_dlambda_xy(
  AGAInv_AGXy,
  AGAInv,
  G,
  A,
  dG_dlambda,
  nc,
  nca,
  K,
  Xy,
  Ghalf,
  dGhalf,
  GhalfXy_temp,
  parallel,
  cl,
  chunk_size,
  num_chunks,
  rem_chunks,
  parallel_qr = FALSE
)

Arguments

AGAInv_AGXy

Product of (\mathbf{A}^{T}\mathbf{G}\mathbf{A})^{-1} and \mathbf{A}^{T}\mathbf{G}\mathbf{X}^{T}\mathbf{y}

AGAInv

Inverse of \mathbf{A}^{T}\mathbf{G}\mathbf{A}

G

List of \mathbf{G} matrices

A

Constraint matrix \mathbf{A}

dG_dlambda

List of d\mathbf{G}/d\lambda matrices

nc

Number of columns

nca

Number of constraint columns

K

Number of partitions minus 1 (K)

Xy

List of \mathbf{X}^{T}\mathbf{y} products

Ghalf

List of \mathbf{G}^{1/2} matrices

dGhalf

List of d\mathbf{G}^{1/2}/d\lambda matrices

GhalfXy_temp

Temporary storage for \mathbf{G}^{1/2}\mathbf{X}^{T}\mathbf{y}

parallel

Use parallel processing

cl

Cluster object

chunk_size

Size of parallel chunks

num_chunks

Number of chunks

rem_chunks

Remaining chunks

Details

This function is called during GCV/penalty optimization and computes how the constrained coefficient vector changes as the penalty weight varies. Two implementations are provided depending on problem size.

Derivation

The constrained estimate is \tilde{\boldsymbol{\beta}} = \mathbf{U}\hat{\boldsymbol{\beta}} where \hat{\boldsymbol{\beta}} = \mathbf{G}\mathbf{X}^T\mathbf{y} and \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^T. Both \mathbf{G} and \mathbf{U} depend on \lambda through \mathbf{G} = (\mathbf{X}^T\mathbf{X} + \lambda\boldsymbol{\Lambda})^{-1}.

Differentiating the product \mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y} requires the chain rule applied to three \lambda-dependent components:

\frac{d}{d\lambda}(\mathbf{U}\mathbf{G}\mathbf{X}^T\mathbf{y})

= \frac{d\mathbf{U}}{d\lambda}\hat{\boldsymbol{\beta}} + \mathbf{U}\frac{d\hat{\boldsymbol{\beta}}}{d\lambda}

The unconstrained part is straightforward: d\hat{\boldsymbol{\beta}}/d\lambda = (d\mathbf{G}/d\lambda)\mathbf{X}^T\mathbf{y}, computed partition-wise since \mathbf{G} is block-diagonal.

The constraint projection derivative is more involved. Writing \mathbf{C} = (\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1} and expanding d\mathbf{U}/d\lambda gives three terms (after applying the product rule and the identity d\mathbf{C}/d\lambda = -\mathbf{C}(d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda)\mathbf{C}):

term2a

(d\mathbf{G}/d\lambda)\mathbf{A}\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}} direct effect of d\mathbf{G}/d\lambda on the projection

term2b

\mathbf{G}\mathbf{A}\mathbf{C}(d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda)\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}} effect through the change in (\mathbf{A}^T\mathbf{G}\mathbf{A})^{-1}

term2c

\mathbf{G}\mathbf{A}\mathbf{C}\mathbf{A}^T(d\hat{\boldsymbol{\beta}}/d\lambda) projection of the unconstrained derivative back through the constraint

The full derivative is d\hat{\boldsymbol{\beta}}/d\lambda minus the sum of these three correction terms.

Large problem path (K >= 10, nc > 4)

Computes the three correction terms explicitly using the block structure of \mathbf{G} and d\mathbf{G}/d\lambda. The intermediate quantity d(\mathbf{A}^T\mathbf{G}\mathbf{A})/d\lambda = \mathbf{A}^T(d\mathbf{G}/d\lambda)\mathbf{A} is accumulated partition-wise. Shared vectors \mathbf{A}\mathbf{C}\mathbf{A}^T\hat{\boldsymbol{\beta}} and related products are precomputed once and reused across partitions. Parallelism is over chunks of partitions.

Small problem path

Reformulates the derivative using the matrix square root \mathbf{G}^{1/2} and its derivative d\mathbf{G}^{1/2}/d\lambda. The constraint \mathbf{A}^T\boldsymbol{\beta} = 0 is imposed via least squares projection: the residuals from regressing \mathbf{G}^{1/2}\mathbf{X}^T\mathbf{y} onto \mathbf{G}^{1/2}\mathbf{A} give the constrained component, and differentiating this projection with respect to \lambda yields the derivative. Uses .lm.fit for speed with a stabilizing rescaling factor to prevent numerical issues when the constraint matrix is poorly scaled.

Value

P \times 1 vector of derivatives d\tilde{\boldsymbol{\beta}}/d\lambda


lgspline documentation built on May 8, 2026, 5:07 p.m.