Getting Started
In l1rotation: Identify Loading Vectors under Sparsity in Factor Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  fig.width = 5, 
  fig.height=4
)

Introduction {#intro}

library(l1rotation)

The l1rotation package offers functionality to simplify the loading matrix in factor models. It can find the most sparse rotation of the loading matrix using the l1-rotation criterion of Freyaldenhoven (2025).

Under the assumption of sparsity in the true loading matrix, it also solves the problem of rotational indeterminacy inherent to factor models. That is, suppose the data follows a factor model:

$$ X = F \Lambda^{*'} + e $$ where

$X$ is a $T \times n$ data matrix, where there are $T$ rows and $n$ variables, or columns
$F$ is a $T \times r$ matrix of $r$ factors that the data is decomposed into
$\Lambda^{*T}$ is an $r \times n$ matrix of true loadings
$e$ is a $T \times n$ error matrix.

Then, the assumption of sparsity in the loading matrix solves the problem of rotational indeterminacy inherent to factor models: $\Lambda^{*}$ will be the most sparse rotation and can be identified using the l1-rotation criterion Freyaldenhoven (2025).

Quick start {#start}

We will use the example_data data that ships with the package to show its basic functionality. This data is a matrix containing numeric information with $n = 224$, $T = 207$. In general, data.frames, tibbles, or other data types can also be used with l1rotation functions, as long as all columns are numeric. Note that the package cannot handle missing values in the data matrix.

To start, let's look at the first seven columns of the example data:

head(example_data[,1:7])

We assume that the number of underlying factors can be learned from the data (e.g., following the procedure in Bai and Ng (2002) or Ahn and Horenstein (2013)). Note that this package does not include functionality to calculate the number of factors - we simply take the number of factors as a user input.

For the example_data we will use two factors. With just the data, $X$, and the number of factors, $r$, we can start estimating the loadings with local_factors(). This function estimates $\Lambda^{*}$ and provides helpful diagnostics and figures.

Below is an example using example_data (note that estimation can also be run in parallel with a selected number of cores, n_cores):

set.seed(916)

lf <- local_factors(
  X = example_data, r = 2,
  parallel = FALSE, n_cores = NULL # Runs non-parallel by default
)

In the estimation, the only required arguments for local_factors() are the data to be decomposed and the number of factors. We use the principal components estimator as the initial estimate of the loadings, $\Lambda^0$, which can be accessed via the initial_loadings item of the output. The function also computes a quick diagnostic to check whether local factors are present in the data which is given in the output item has_local_factors.

Additionally, there are several rotation diagnostics accessible via rotation_diagnostics: (1) The rotation matrix, $R$, that when multiplied by $\Lambda^0$ produces $\hat{\Lambda}^*$, (2) the value of the l1 norm for each vector, and (3) the frequency with which the minimization problem converges at each of the estimated loading vectors.

lf$rotation_diagnostics

For a visual interpretation of this rotation, we provide tile plots contrasting the initial estimate $\Lambda^0$, pc_plot and the rotated estimate, $\hat{\Lambda}^*$, rotated_plot.

lf$pc_plot

In the initial principal component estimate, recall that each factor is simply a principal component. Of the 207 variables in $X$, the first factor loads most negatively along the variables between 90 and 120, and slightly negatively almost everywhere else. Along the second factor, there are positive loadings between variables 122 and 207 and slightly negative loadings elsewhere. However, it may be difficult to interpret the relationship between variables and factors when all loadings are nonzero.

lf$rotated_plot

The second estimate is a rotated version of the loading matrix optimized for sparsity using the l1-rotation criterion. This estimate is easier to interpret as most variables between 90-207 load negatively on the first factor, variables 0-120 load negatively on the second factor, and all other loadings are close to zero.

Refining the details {#details}

l1rotation supplies two additional functions, find_local_factors() and test_local_factors() which provide additional functionality to support the main local_factors() function.

`find_local_factors()` {#flf}

This function takes the same inputs as local_factors(), X and r, and has an additional argument, initial_loadings, that allows the user to specify any orthonormal basis of the loadings rather than defaulting to the principal component estimator. Alternative initial estimates may include Maximum Likelihood based estimation or Sparse Orthogonal Factor Regression (Uematsu et al. (2019)), for example.

`test_local_factors()` {#tlf}

This function tests for the presence of local factors given a sparse basis of the loading space. It takes as input X and r, and an additional optional argument, loadings, that allows the user to specify the loading matrix that is to be tested. This argument is set to NULL by default, which estimates $\hat{\Lambda}^*$ (by maximizing the l1-rotation criterion) and tests it for local factors.

To construct this diagnostic for a given loading matrix estimate, $\hat{\Lambda}$, we find the column with the largest number of entries smaller than some threshold $h_n$:

$$ \mathcal{L(\hat{\Lambda})} = \max_k\left(\sum_{i=1}^n 1{\hat{|\lambda}_{ik}| < h_n }\right) $$ We can then check whether the number of these "small" loadings is larger than $\gamma n$

$$ \texttt{has_local_factors} = 1{\mathcal{L}(\hat{\Lambda}) \geq \gamma n }. $$

Returning to our lf results, we can take a look at the value of has_local_factors.

lf$has_local_factors

This value is the result of test_local_factors(). To verify, we can call test_local_factors() on two different estimates: the principal components estimate, initial_loadings, and the l1rotation estimate, rotated_loadings.

# Check for local factors in PC estimate...
test_pc_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$initial_loadings)

# And rotated estimate
test_rot_estimate <- test_local_factors(X = example_data, r = 2, loadings = lf$rotated_loadings)

test_pc_estimate$has_local_factors
test_rot_estimate$has_local_factors

Thus, no local factors are detected using the principal components estimate, while the rotated estimate reveals two local factors. We can also visualize the number of small loadings using the small_loadings_plot element in our `lf' results:

round_hn <- round(test_rot_estimate$h_n, digits = 3)

lf$small_loadings_plot + 
  ggplot2::labs(
    title = 'Number of "small" loadings per factor', 
    caption = paste('"Small" is defined as loadings less than', round_hn)
  )