compute_trace_H: Effective degrees of freedom via trace of the hat matrix

View source: R/HelperFunctions.R

compute_trace_HR Documentation

Effective degrees of freedom via trace of the hat matrix

Description

Computes \mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top \mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) without forming the N \times N hat matrix, by reducing the problem to P \times P and r \times r operations.

Usage

compute_trace_H(
  G,
  Lambda,
  A,
  AGAInv,
  nc,
  nca,
  K,
  parallel,
  cl,
  chunk_size,
  num_chunks,
  rem_chunks,
  unique_penalty_per_partition,
  L_partition_list
)

Arguments

G

List of K+1 matrices \mathbf{G}_k, each p \times p.

Lambda

Base penalty matrix \boldsymbol{\Lambda}, p \times p, shared across all partitions. When unique_penalty_per_partition = FALSE, this is the full per-block penalty. When TRUE, the effective penalty for partition k is \boldsymbol{\Lambda} + \mathbf{L}_k.

A

Constraint matrix, P \times r.

AGAInv

Precomputed (\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}, r \times r.

nc

Number of columns per partition block (p).

nca

Number of constraint columns (r).

K

Number of interior knots (so there are K+1 partitions).

parallel

Logical; use parallel processing.

cl

Cluster object for parallel computation.

chunk_size

Size of parallel chunks.

num_chunks

Number of full-size chunks.

rem_chunks

Number of remaining partitions in the final chunk.

unique_penalty_per_partition

Logical; if TRUE, each partition receives an additional penalty from L_partition_list.

L_partition_list

List of partition-specific penalty matrices \mathbf{L}_k, each p \times p. Only used when unique_penalty_per_partition = TRUE; ignored otherwise.

Details

Derivation

Let \mathbf{G} = (\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} + \boldsymbol{\Lambda})^{-1} and \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top. By the cyclic property of the trace:

\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = \mathrm{tr}(\mathbf{U}\mathbf{G}\,\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X})

Substituting \mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} = \mathbf{G}^{-1} - \boldsymbol{\Lambda}:

= \mathrm{tr}(\mathbf{U}\mathbf{G}(\mathbf{G}^{-1} - \boldsymbol{\Lambda})) = \mathrm{tr}(\mathbf{U}) - \mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda})

Since \mathbf{U} is idempotent, \mathrm{tr}(\mathbf{U}) = \mathrm{rank}(\mathbf{U}) = P - r where r = \mathrm{rank}(\mathbf{A}). For the second term, expand \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top:

\mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda}) = \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) - \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)

using the cyclic property on the second term. Both \mathbf{G} and \boldsymbol{\Lambda} are block-diagonal with K+1 blocks of dimension p \times p, so:

\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) = \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)

and \mathbf{G}\boldsymbol{\Lambda}\mathbf{G} is also block-diagonal with blocks \mathbf{G}_k\boldsymbol{\Lambda}_k\mathbf{G}_k.

Combining:

\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = (P - r) - \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) + \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)

The correction term \mathrm{tr}((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}) is an r \times r trace and captures the degrees of freedom recovered because the constraints pin certain linear combinations of penalized coefficients, removing them from estimation. When \boldsymbol{\Lambda} = \mathbf{0} (no penalty), both \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) and the correction vanish, giving \mathrm{edf} = P - r. When \mathbf{U} = \mathbf{I} (no constraints), r = 0 and the correction vanishes, giving \mathrm{edf} = P - \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}).

The result is invariant to the correlation structure \mathbf{V} and GLM weights \mathbf{W}, which enter only through \mathbf{G}.

Partition-specific penalties

When unique_penalty_per_partition = TRUE, each partition k has an additional penalty component from L_partition_list. In the notation of the paper, this is an added \lambda_{l,k}\boldsymbol{\Lambda}_{l,k}, so the effective per-partition penalty becomes \boldsymbol{\Lambda}_k = \boldsymbol{\Lambda} + \lambda_{l,k}\boldsymbol{\Lambda}_{l,k}. All block-wise traces and products use \boldsymbol{\Lambda}_k in place of the shared \boldsymbol{\Lambda}.

Computational cost

The partition-wise traces \mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) cost O(p^2) each and are parallelizable. The correction term requires forming \mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A} (block-diagonal times sparse, O((K+1)p^2 r)) and a single r \times r trace (O(r^2)). The total cost is O((K+1)p^2 + r^2), compared to O(N^2) or O(NP) for forming and tracing the full hat matrix.

Value

Scalar effective degrees of freedom, clamped to [0, \, (K+1)p].


lgspline documentation built on May 8, 2026, 5:07 p.m.