compute_trace_H: Effective degrees of freedom via trace of the hat matrix
In lgspline: Lagrangian Multiplier Smoothing Splines for Smooth Function Estimation

compute_trace_H

R Documentation

Effective degrees of freedom via trace of the hat matrix

Description

Computes \mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top \mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) without forming the N \times N hat matrix, by reducing the problem to P \times P and r \times r operations.

Usage

compute_trace_H(
  G,
  Lambda,
  A,
  AGAInv,
  nc,
  nca,
  K,
  parallel,
  cl,
  chunk_size,
  num_chunks,
  rem_chunks,
  unique_penalty_per_partition,
  L_partition_list
)

Arguments

`G`	List of `K+1` matrices `\mathbf{G}_k`, each `p \times p`.
`Lambda`	Base penalty matrix `\boldsymbol{\Lambda}`, `p \times p`, shared across all partitions. When `unique_penalty_per_partition = FALSE`, this is the full per-block penalty. When `TRUE`, the effective penalty for partition `k` is `\boldsymbol{\Lambda} + \mathbf{L}_k`.
`A`	Constraint matrix, `P \times r`.
`AGAInv`	Precomputed `(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}`, `r \times r`.
`nc`	Number of columns per partition block (`p`).
`nca`	Number of constraint columns (`r`).
`K`	Number of interior knots (so there are `K+1` partitions).
`parallel`	Logical; use parallel processing.
`cl`	Cluster object for parallel computation.
`chunk_size`	Size of parallel chunks.
`num_chunks`	Number of full-size chunks.
`rem_chunks`	Number of remaining partitions in the final chunk.
`unique_penalty_per_partition`	Logical; if `TRUE`, each partition receives an additional penalty from `L_partition_list`.
`L_partition_list`	List of partition-specific penalty matrices `\mathbf{L}_k`, each `p \times p`. Only used when `unique_penalty_per_partition = TRUE`; ignored otherwise.

Details

Derivation

Let \mathbf{G} = (\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} + \boldsymbol{\Lambda})^{-1} and \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top. By the cyclic property of the trace:

\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = \mathrm{tr}(\mathbf{U}\mathbf{G}\,\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X})

Substituting \mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} = \mathbf{G}^{-1} - \boldsymbol{\Lambda}:

= \mathrm{tr}(\mathbf{U}\mathbf{G}(\mathbf{G}^{-1} - \boldsymbol{\Lambda})) = \mathrm{tr}(\mathbf{U}) - \mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda})

Since \mathbf{U} is idempotent, \mathrm{tr}(\mathbf{U}) = \mathrm{rank}(\mathbf{U}) = P - r where r = \mathrm{rank}(\mathbf{A}). For the second term, expand \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top:

\mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda}) = \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) - \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)

using the cyclic property on the second term. Both \mathbf{G} and \boldsymbol{\Lambda} are block-diagonal with K+1 blocks of dimension p \times p, so:

\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) = \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)

and \mathbf{G}\boldsymbol{\Lambda}\mathbf{G} is also block-diagonal with blocks \mathbf{G}_k\boldsymbol{\Lambda}_k\mathbf{G}_k.

Combining:

\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = (P - r) - \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) + \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)

The correction term \mathrm{tr}((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}) is an r \times r trace and captures the degrees of freedom recovered because the constraints pin certain linear combinations of penalized coefficients, removing them from estimation. When \boldsymbol{\Lambda} = \mathbf{0} (no penalty), both \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) and the correction vanish, giving \mathrm{edf} = P - r. When \mathbf{U} = \mathbf{I} (no constraints), r = 0 and the correction vanishes, giving \mathrm{edf} = P - \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}).

The result is invariant to the correlation structure \mathbf{V} and GLM weights \mathbf{W}, which enter only through \mathbf{G}.

Partition-specific penalties

When unique_penalty_per_partition = TRUE, each partition k has an additional penalty component from L_partition_list. In the notation of the paper, this is an added \lambda_{l,k}\boldsymbol{\Lambda}_{l,k}, so the effective per-partition penalty becomes \boldsymbol{\Lambda}_k = \boldsymbol{\Lambda} + \lambda_{l,k}\boldsymbol{\Lambda}_{l,k}. All block-wise traces and products use \boldsymbol{\Lambda}_k in place of the shared \boldsymbol{\Lambda}.

Computational cost

The partition-wise traces \mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) cost O(p^2) each and are parallelizable. The correction term requires forming \mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A} (block-diagonal times sparse, O((K+1)p^2 r)) and a single r \times r trace (O(r^2)). The total cost is O((K+1)p^2 + r^2), compared to O(N^2) or O(NP) for forming and tracing the full hat matrix.