tpca: Tailors the choice of principal components for change...
In Tveten/tpca: A function for tailoring PCA to change detection

Description Usage Arguments Details Value Examples

View source: R/tpca.R

tpca tailors the choice of principal components to keep when detection of changepoints in the mean vector or covariance matrix is the aim. The choice of principal axes to project data onto is based on a normal state covariance matrix and a distribution over relevant change scenarios.

1
2
3

tpca(cov_mat, change_distr = "full_uniform",
  divergence = "normal_hellinger", cutoff = 0.99,
  max_axes = ncol(cov_mat), n_sim = 10^3)

`cov_mat`	A covariance matrix, i.e., a numeric matrix that is positive definite.
`change_distr`	A string or a change distribution object. A string can be used to choose among a set of already implemented distributions (see details). Custom distributions can be specified by using the `set_uniform_cd` function.
`divergence`	A string specifying which divergence metric to use. Available options: 'normal_hellinger', 'normal_KL' and 'normal_bhat'.
`cutoff`	A numeric between 0 and 1 governing how many principal axes to retain.
`max_axes`	An integer indicating the maximum number of axes that should be returned regardless of what the cutoff is.
`n_sim`	An integer specifying the number of simulation runs.

This method is based on simulating changes to a distribution, followed by measuring the principal axes' sensitivity to each change by a statistical divergence. The most sensitive axis is recorded in each simulated change to estimate the probability of an axis being the most sensitive one over the range of changes specified by a change distribution.

Custom change distributions can be built by using the function set_uniform_cd. All components of the distribution are uniform, but the probability/importance of each type of change can be specified, along with the sparsity of the change, and all the sizes and directions of the changes. In each simulation run, after a change sparsity has been drawn, which dimensions that are affected by a change is always randomized. The more information about which changes that are of interest, the better and less general the choice of axes will be.

Built in choices for change distributions are implemented as calls to set_uniform_cd:

"full_uniform" (default): set_uniform_cd(data_dim)
"full_uniform_equal": set_uniform_cd(data_dim, change_equal = TRUE)
"full_uniform_large": set_uniform_cd(data_dim, mean_int = c(-3, 3), sd_int = c(4^(-1), 4), cor_int = c(0, 0.5))
"full_uniform_small": set_uniform_cd(data_dim, mean_int = c(-0.5, 0.5), sd_int = c(1.5^(-1), 1.5), cor_int = c(0.5, 1))
"semisparse_uniform": set_uniform_cd(data_dim, sparsities = 2:round(data_dim / 2))
"mean_only": set_uniform_cd(data_dim, prob = c(1, 0, 0))
"semisparse_mean_only": set_uniform_cd(data_dim, prob = c(1, 0, 0), sparsities = 2:round(data_dim / 2))
"sd_only": set_uniform_cd(data_dim, prob = c(0, 1, 0))
"semisparse_sd_only": set_uniform_cd(data_dim, prob = c(0, 1, 0), sparsities = 2:round(data_dim / 2))
"cor_only": set_uniform_cd(data_dim, prob = c(0, 0, 1))
"semisparse_cor_only": set_uniform_cd(data_dim, prob = c(0, 0, 1), sparsities = 2:round(data_dim / 2))

See the references for more information.

tpca returns an S3 object of class "tpca". This is a list with the following components:

axes: A matrix with the chosen principal axes as rows, ordered in decreasing order of sensitivity.
which_axes: A vector indicating which principal axes that were chosen in decreasing order of sensitivity.
prop_axes_max: A vector with the proportion of simulations each axis was the most sensitive one.
divergence_sim: A matrix containing all the simulated draws from the divergence metric along each principal axis. It is of dimension data_dim x n_sim.
change_type: A character vector indicating the type of change for each iteration of the simulation.
change_sparsity: A numeric vector indicating the sparsity of the change for each iteration of the simulation.