determine_factors: Determine the Optimal Number of Factors via an Information...

View source: R/pca_function.R

determine_factorsR Documentation

Determine the Optimal Number of Factors via an Information Criterion

Description

This function selects the optimal number of factors for a local principal component analysis (PCA) model of asset returns. It computes an BIC-type information criterion (IC) for each candidate number of factors, based on the sum of squared residuals (SSR) from the PCA reconstruction and a penalty term that increases with the number of factors. The optimal number of factors is chosen as the one that minimizes the IC. The procedure is available either as a stand-alone function or as a method in the 'TVMVP' R6 class.

Usage

determine_factors(returns, max_m, bandwidth = silverman(returns))

Arguments

returns

A numeric matrix of asset returns with dimensions T \times p.

max_m

Integer. The maximum number of factors to consider.

bandwidth

Numeric. Kernel bandwidth for local PCA. Default is Silverman's rule of thumb.

Details

Two usage styles:

# Function interface
determine_factors(returns, max_m = 5)

# R6 method interface
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()

When using the method form, if 'max_m' or 'bandwidth' are omitted, they default to values stored in the object. Results are cached and retrievable via class methods.

For each candidate number of factors m (from 1 to max_m), the function:

  1. Performs a local PCA on the returns at each time point r = 1,\dots,T using m factors.

  2. Computes a reconstruction of the returns and the corresponding residuals:

    \text{Residual}_r = R_r - F_r \Lambda_r,

    where R_r is the return at time r, and F_r and \Lambda_r are the local factors and loadings, respectively.

  3. Computes the average sum of squared residuals (SSR) as:

    V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.

  4. Adds a penalty term that increases with R:

    \text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).

  5. The information criterion is defined as:

    \text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).

The optimal number of factors is then chosen as the value of m that minimizes \text{IC}(m).

Value

A list with:

  • optimal_m: Integer. The optimal number of factors.

  • IC_values: Numeric vector of IC values for each candidate m.

References

Su, L., & Wang, X. (2017). On time-varying factor models: Estimation and testing. Journal of Econometrics, 198(1), 84–101.

Examples

set.seed(123)
returns <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30)

# Function usage
result <- determine_factors(returns, max_m = 5)
print(result$optimal_m)
print(result$IC_values)

# R6 usage
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()


TVMVP documentation built on June 28, 2025, 1:08 a.m.