bfpca: Binary functional principal components analysis

View source: R/bfpca.R

bfpcaR Documentation

Binary functional principal components analysis

Description

Function used in the FPCA step for registering binary functional data, called by register_fpca when family = "binomial". This method uses a variational EM algorithm to estimate scores and principal components for binary functional data.

The number of functional principal components (FPCs) can either be specified directly (argument npc) or chosen based on the explained share of variance (npc_varExplained). In the latter case, the explained share of variance and accordingly the number of FPCs is estimated before the main estimation step by once running the FPCA with npc = 20 (and correspondingly Kt = 20). Doing so, we approximate the overall variance in the data Y with the variance represented by the FPC basis with 20 FPCs.

Usage

bfpca(
  Y,
  npc = NULL,
  npc_varExplained = NULL,
  Kt = 8,
  maxiter = 50,
  t_min = NULL,
  t_max = NULL,
  print.iter = FALSE,
  row_obj = NULL,
  seed = 1988,
  periodic = FALSE,
  error_thresh = 1e-04,
  verbose = 1,
  subsample = TRUE,
  ...
)

Arguments

Y

Dataframe. Should have variables id, value, index.

npc

The number of functional principal components (FPCs) has to be specified either directly as npc or based on their explained share of variance. In the latter case, npc_varExplained has to be set to a share between 0 and 1.

npc_varExplained

The number of functional principal components (FPCs) has to be specified either directly as npc or based on their explained share of variance. In the latter case, npc_varExplained has to be set to a share between 0 and 1.

Kt

Number of B-spline basis functions used to estimate mean functions and functional principal components. Default is 8. If npc_varExplained is used, Kt is set to 20.

maxiter

Maximum number of iterations to perform for EM algorithm. Default is 50.

t_min

Minimum value to be evaluated on the time domain.

t_max

Maximum value to be evaluated on the time domain.

print.iter

Prints current error and iteration

row_obj

If NULL, the function cleans the data and calculates row indices. Keep this NULL if you are using standalone register function.

seed

Set seed for reproducibility. Defaults to 1988.

periodic

If TRUE, uses periodic b-spline basis functions. Default is FALSE.

error_thresh

Error threshold to end iterations. Defaults to 0.0001.

verbose

Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1.

subsample

if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients.

...

Additional arguments passed to or from other functions

Value

An object of class fpca containing:

fpca_type

Information that FPCA was performed with the 'variationEM' approach, in contrast to registr::gfpca_twoStep.

t_vec

Time vector over which the mean mu and the functional principal components efunctions were evaluated.

knots

Cutpoints for B-spline basis used to rebuild alpha.

efunctions

D \times npc matrix of estimated FPC basis functions.

evalues

Estimated variance of the FPC scores.

evalues_sum

Approximation of the overall variance in Y, based on an initial run of the FPCA with npc = 20. Is NULL if npc_varExplained was not specified.

npc

number of FPCs.

scores

I \times npc matrix of estimated FPC scores.

alpha

Estimated population-level mean.

mu

Estimated population-level mean. Same value as alpha but included for compatibility with refund.shiny package.

subject_coefs

B-spline basis coefficients used to construct subject-specific means. For use in registr() function.

Yhat

FPC approximation of subject-specific means, before applying the response function.

Y

The observed data.

family

binomial, for compatibility with refund.shiny package.

error

vector containing error for each iteration of the algorithm.

Author(s)

Julia Wrobel julia.wrobel@cuanschutz.edu, Jeff Goldsmith ajg2202@cumc.columbia.edu, Alexander Bauer alexander.bauer@stat.uni-muenchen.de

References

Jaakkola, T. S. and Jordan, M. I. (1997). A variational approach to Bayesian logistic regression models and their extensions. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics.

Tipping, M. E. (1999). Probabilistic Visualisation of High-dimensional binary data. Advances in neural information processing systems, 592–598.

Examples

Y = simulate_functional_data()$Y

# estimate 2 FPCs
bfpca_obj = bfpca(Y, npc = 2, print.iter = TRUE, maxiter = 25)



plot(bfpca_obj)

# estimate npc adaptively, to explain 90% of the overall variation
bfpca_obj2 = bfpca(Y, npc_varExplained = 0.9, print.iter = TRUE, maxiter = 30)
plot(bfpca_obj2)


registr documentation built on Oct. 3, 2022, 1:05 a.m.