determine_factors | R Documentation |
This function selects the optimal number of factors for a local principal component analysis (PCA) model of asset returns. It computes an BIC-type information criterion (IC) for each candidate number of factors, based on the sum of squared residuals (SSR) from the PCA reconstruction and a penalty term that increases with the number of factors. The optimal number of factors is chosen as the one that minimizes the IC. The procedure is available either as a stand-alone function or as a method in the 'TVMVP' R6 class.
determine_factors(returns, max_m, bandwidth = silverman(returns))
returns |
A numeric matrix of asset returns with dimensions |
max_m |
Integer. The maximum number of factors to consider. |
bandwidth |
Numeric. Kernel bandwidth for local PCA. Default is Silverman's rule of thumb. |
Two usage styles:
# Function interface determine_factors(returns, max_m = 5) # R6 method interface tv <- TVMVP$new() tv$set_data(returns) tv$determine_factors(max_m = 5) tv$get_optimal_m() tv$get_IC_values()
When using the method form, if 'max_m' or 'bandwidth' are omitted, they default to values stored in the object. Results are cached and retrievable via class methods.
For each candidate number of factors m
(from 1 to max_m
), the function:
Performs a local PCA on the returns at each time point r = 1,\dots,T
using m
factors.
Computes a reconstruction of the returns and the corresponding residuals:
\text{Residual}_r = R_r - F_r \Lambda_r,
where R_r
is the return at time r
, and F_r
and \Lambda_r
are the local factors and loadings, respectively.
Computes the average sum of squared residuals (SSR) as:
V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.
Adds a penalty term that increases with R
:
\text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).
The information criterion is defined as:
\text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).
The optimal number of factors is then chosen as the value of m
that minimizes \text{IC}(m)
.
A list with:
optimal_m
: Integer. The optimal number of factors.
IC_values
: Numeric vector of IC values for each candidate m
.
Su, L., & Wang, X. (2017). On time-varying factor models: Estimation and testing. Journal of Econometrics, 198(1), 84–101.
set.seed(123)
returns <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30)
# Function usage
result <- determine_factors(returns, max_m = 5)
print(result$optimal_m)
print(result$IC_values)
# R6 usage
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.