fregression: Approximate low-rank processes from sparse longitudinal...

View source: R/fregression.R

fregressionR Documentation

Approximate low-rank processes from sparse longitudinal observations

Description

Method approximates individual trajectories from sparse noisy observations. Suppose, that we measure progression of a certain measurment over time at some irregular timepoints for multiple subjects. We want to approximate the progression process for each individual.

Usage

fregression(
  formula,
  data,
  covariates = NULL,
  bins = 51,
  method = c("fimpute", "fpcs", "mean", "pg"),
  lambda = c(0),
  maxIter = 1e+05,
  lambda.reg = 0,
  d = 7,
  K = NULL,
  K.reg = NULL,
  thresh = 1e-05,
  final = "soft",
  fold = 5,
  cv.ratio = 0.05,
  projection = "separate",
  verbose = 0,
  scale.covariates = TRUE,
  basis.type = "splines",
  lr = 1
)

Arguments

formula

formula describing the linear relation between processes and indicating time and grouping variables. See details

data

data in the long format.

bins

number of bins for matrix representation of the data

method

algorithm to use for finding model parameters: fpca for functional principal components, mean for mean impute, fimpute for functional impute, pg for proximal gradient

lambda

lambdas for SVD regularization in functional impute

lambda.reg

lambdas for SVD regularization in regression

d

dimensionality of the basis

K

upper bound of dimensionality for SVD regularization

K.reg

upper bound of dimensionality for regression

thresh

thershold for convergence in functional imputee

final

should the final model use "hard" or "soft" impute after choosing the optimal lambda

fold

how many folds in cross-validation

projection

"joint" or "separate" (default). If multiple regressors are available project them jointly or separately

Details

For a subject i, we observe Y^{i}(t),X_1^{i}(t),...,X_p^{i}(t) at irregular subject specific t \in t_1,...,t_p, where 0 < t_j < T. We can bin the time interval [0,T] and represent each individual as a vector of fixed length with missing values. Let Y, X1, ..., Xp be such matrices. Columns correspond to timepoints and rows to subjects.

There are multitple methods for approximating the process Y, we can:

  • regress Y on X_1,X_2,...,X_p, we can use sparse functional regression

  • project each subject into latent space and impute Y, X_1,X_2,...,X_p simultaniously

  • use only information from Y, we can use functional PCA method or functional impute.

Function fregression is an interface for fitting models for all three scenarios. Suppose data is a data matrix in the long format, i.e. data is a matrix with p + 3 columns, where data[,1] is a subjectID, data[,2] is time, data[,3] is a value observation of Y and remaining columns are covariates X1, ..., Xp. Each row corresponds to one observation for one subject.

There are three possible formulas:

  • Y ~ time + X1 + X2 | subjectID executes functional regression

  • Y + X1 + X2 ~ time | subjectID executes dimensionality reduction

  • Y ~ time | subjectID executes functional impute or functional PCA depending on the choice of method parameter

Value

Returns a list

  • fit fitted matrix Y

  • meta results of cross-validation

  • u,d,v svd of the underlying processes if the functional impute method has been chosen

In case of multidimensional SVD and simultanious approximation of Y,X1,X2,...,Xp, $fit is a list of models for Y,X1,X2,...,Xp.

References

James, Gareth M., Trevor J. Hastie, and Catherine A. Sugar. Principal component models for sparse functional data. Biometrika 87.3 (2000): 587-602.

Lukasz Kidzinski and Trevor J. Hastie. Modeling longitudinal data using matrix completion. Under review (2021)

Examples

# SIMULATE DATA
simulation = fsimulate(seed = 1)
data = simulation$data
ftrue = simulation$ftrue
K = simulation$params$K

model.mean = fregression(Y ~ time | id, data,
                         method = "mean")
model.fpca = fregression(Y ~ time | id, data,
                         lambda = 0, K = c(3,4,5), thresh = 1e-7, method = "fpcs")

lambdas = c(2,3,4,5,6,8,10,12,15,20)
model.fimp = fregression(Y ~ time | id, data,
                         lambda = lambdas, thresh = 1e-5, final = "hard")
model.fcmp = fregression(Y + X1 + X2 ~ time | id, data, covariates,
                         lambda = lambdas, K = K, final = "hard")
model.freg = fregression(Y ~ U1 + U2 + time | id, data, model.fcmp$u,
                         lambda = lambdas, thresh = 1e-5,
                         lambda.reg = 0.1, method = "fpcs", K = K)

kidzik/fcomplete documentation built on Aug. 24, 2023, 5:44 a.m.