View source: R/HelperFunctions.R
| compute_trace_H | R Documentation |
Computes \mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top
\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) without forming
the N \times N hat matrix, by reducing the problem to
P \times P and r \times r operations.
compute_trace_H(
G,
Lambda,
A,
AGAInv,
nc,
nca,
K,
parallel,
cl,
chunk_size,
num_chunks,
rem_chunks,
unique_penalty_per_partition,
L_partition_list
)
G |
List of |
Lambda |
Base penalty matrix |
A |
Constraint matrix, |
AGAInv |
Precomputed |
nc |
Number of columns per partition block ( |
nca |
Number of constraint columns ( |
K |
Number of interior knots (so there are |
parallel |
Logical; use parallel processing. |
cl |
Cluster object for parallel computation. |
chunk_size |
Size of parallel chunks. |
num_chunks |
Number of full-size chunks. |
rem_chunks |
Number of remaining partitions in the final chunk. |
unique_penalty_per_partition |
Logical; if |
L_partition_list |
List of partition-specific penalty matrices
|
Let \mathbf{G} = (\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} + \boldsymbol{\Lambda})^{-1}
and \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top.
By the cyclic property of the trace:
\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = \mathrm{tr}(\mathbf{U}\mathbf{G}\,\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X})
Substituting \mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}\mathbf{X} = \mathbf{G}^{-1} - \boldsymbol{\Lambda}:
= \mathrm{tr}(\mathbf{U}\mathbf{G}(\mathbf{G}^{-1} - \boldsymbol{\Lambda})) = \mathrm{tr}(\mathbf{U}) - \mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda})
Since \mathbf{U} is idempotent,
\mathrm{tr}(\mathbf{U}) = \mathrm{rank}(\mathbf{U}) = P - r
where r = \mathrm{rank}(\mathbf{A}). For the second term,
expand \mathbf{U} = \mathbf{I} - \mathbf{G}\mathbf{A}(\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top:
\mathrm{tr}(\mathbf{U}\mathbf{G}\boldsymbol{\Lambda}) = \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) - \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)
using the cyclic property on the second term. Both \mathbf{G}
and \boldsymbol{\Lambda} are block-diagonal with K+1
blocks of dimension p \times p, so:
\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) = \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)
and \mathbf{G}\boldsymbol{\Lambda}\mathbf{G} is also block-diagonal
with blocks \mathbf{G}_k\boldsymbol{\Lambda}_k\mathbf{G}_k.
Combining:
\mathrm{tr}(\mathbf{X}\mathbf{U}\mathbf{G}\mathbf{X}^\top\mathbf{W}^{1/2}\mathbf{V}^{-1}\mathbf{W}^{1/2}) = (P - r) - \sum_{k=1}^{K+1}\mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k) + \mathrm{tr}\!\left((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}\right)
The correction term
\mathrm{tr}((\mathbf{A}^\top\mathbf{G}\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A})
is an r \times r trace and captures the degrees of freedom
recovered because the constraints pin certain linear combinations of
penalized coefficients, removing them from estimation. When
\boldsymbol{\Lambda} = \mathbf{0} (no penalty), both
\mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}) and the correction
vanish, giving \mathrm{edf} = P - r. When \mathbf{U} = \mathbf{I}
(no constraints), r = 0 and the correction vanishes, giving
\mathrm{edf} = P - \mathrm{tr}(\mathbf{G}\boldsymbol{\Lambda}).
The result is invariant to the correlation structure \mathbf{V}
and GLM weights \mathbf{W}, which enter only through
\mathbf{G}.
When unique_penalty_per_partition = TRUE, each partition k
has an additional penalty component from L_partition_list. In the
notation of the paper, this is an added
\lambda_{l,k}\boldsymbol{\Lambda}_{l,k}, so the effective
per-partition penalty becomes
\boldsymbol{\Lambda}_k = \boldsymbol{\Lambda} +
\lambda_{l,k}\boldsymbol{\Lambda}_{l,k}.
All block-wise traces and products use \boldsymbol{\Lambda}_k
in place of the shared \boldsymbol{\Lambda}.
The partition-wise traces \mathrm{tr}(\mathbf{G}_k\boldsymbol{\Lambda}_k)
cost O(p^2) each and are parallelizable. The correction term
requires forming \mathbf{G}\boldsymbol{\Lambda}\mathbf{G}\mathbf{A}
(block-diagonal times sparse, O((K+1)p^2 r)) and a single
r \times r trace (O(r^2)). The total cost is
O((K+1)p^2 + r^2), compared to O(N^2) or O(NP)
for forming and tracing the full hat matrix.
Scalar effective degrees of freedom, clamped to [0, \, (K+1)p].
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.