misfit: MISFIT (Multiple Imputation for Sparsely-sampled Functions at...

Description Usage Arguments Details References Examples

View source: R/misfit.R

Description

Performs MISFIT for either linear (family="gaussian") or logistic (family="binomial") regression.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
misfit(
  dat,
  grid,
  nimps = 10,
  J = NULL,
  pve = 0.95,
  family = "gaussian",
  link = NULL,
  impute_type = "Multiple",
  cond.y = T,
  seed = NULL,
  user_params = NULL,
  use_fcr = TRUE,
  k = -1,
  fcr.args = list(use_bam = T, niter = 1),
  face.args = list(knots = 12, lower = -3, pve = 0.95)
)

Arguments

dat

A data frame with n rows (where N is the number of subjects, each with m_i observations, so that ∑_{i=1}^N m_i = n) expected to have either 3 or 4. If cond.y is TRUE, should include 4 columns, with variables 'X','y','subj', and 'argvals'. If cond.y is FALSE, only 3 columns are needed (no 'y' variable is used).

grid

A length M vector of the unique desired grid points on which to evaluate the function.

nimps

An integer specifying the number of desired imputations, if impute_type is "Multiple".

J

An integer specifying the number of FPCs to include in the regression model. By default (NULL), J will be chosen as the minimum number of FPCs required to explain a given percentage of variance.

pve

The desired percentage of variance to be explained by the FPCs. Only used if J is not supplied. Defaults to 0.95.

family

A string indicating the family of the response variable. Currently only "gaussian" (linear regression) and "binomial" (logistic regression) are supported.

impute_type

A string indicating whether to use mean or multiple imputation. Only accepts "Mean" or "Multiple". Defaults to "Multiple".

cond.y

A boolean indicating whehter to condition on the response variable when imputing. Defaults to TRUE.

seed

An integer used to specify the seed. Optional, but useful for making results reproducible in the Multiple Imputation step.

user_params

An optional list of user-defined imputation parameters. Currently, the user must provide either all necessary imputation parameters, or none. See 'Details'.

use_fcr

A boolean indicating whether to use fcr or FPCA when estimating the necessary imputation parameters. TRUE indicates fcr, FALSE indicates pace. Default is TRUE. See 'Details' for more discussion.

k

Dimension of the smooth terms used in fcr. Default is 15.

fcr.args

A list of arguments which can be passed to fcr (for the estimation of imputaion parameters). Default is to use use_bam = T and niter = 1. The list must not contain the formula, which is constructed within misfit. See fcr for more details.

face.args

A list of arguments to be passed to the underlying function face.sparse. Currently defaults to setting knots = 12 and pve = 0.95. See face.sparse for more details.

Details

When using the user_params argument, the user must supply a list containing the following elements.

Linear Regression:

Logistic Regression:

By default, use_fcr is TRUE, meaning that fcr is used to estimate imputation parameters. Using FPCA (i.e. use_fcr = FALSE) is roughly 10 times faster, at least for small to moderate data sets. For a single use of the function, this difference is not meaningful as both methods complete in under a minute. But when performing simulations, this speed difference is significant. More testing is needed to determine which method more accuartely estimates the imputation parameters. See 'References' below for details on the methods used in fcr and FPCA.

References

Leroux, A., Xiao, L., Crainiceanu, C., & Checkley, W. (2018). Dynamic prediction in functional concurrent regression with an application to child growth. Statistics in medicine, 37(8), 1376-1388.

Yao, Fang, Hans-Georg Mueller, and Jane-Ling Wang. "Functional data analysis for sparse longitudinal data." Journal of the American Statistical Association 100, no. 470 (2005): 577-590.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## Not run: 

###################################################################
#------- Example Using MISFIT for a Linear SoF Model -------------#
###################################################################

set.seed(123)

## Data generation
M <- 100 # grid size
N <- 400 # sample size
m <- 2 # observations per subject
J <- 5 # number of FPCs to use
nimps <- 10 # number of imputations
var_eps <- 1 # variance of model error
var_delt <- 0.5 # variance of measurement error
grid <- seq(from=0,to=1,length.out = M)
mux <- rep(0,M)
Cx_f<-function(t,s,sig2=1,rho=0.5){ # Matern covariance function with nu = 5/2
 d <- abs(outer(t,s,"-"))
 tmp2 <- sig2*(1+sqrt(5)*d/rho + 5*d^2/(3*rho^2))*exp(-sqrt(5)*d/rho)}
Cx <- Cx_f(grid,grid)
lam <- eigen(Cx,symmetric = T)$values/M
phi <- eigen(Cx,symmetric = T)$vectors*sqrt(M)

beta <- 10*(sin(2*pi*grid)+1)
alpha <- 0

X_s <- mvrnorm(N,mux,Cx)
X_comp <- X_s + rnorm(N*M,sd = sqrt(var_delt))
Xi <- (X_s-mux)%*%phi/M
eps <- rnorm(N,0,sd = sqrt(var_eps))
y <- c(alpha + X_s%*%beta/M + eps)

X_mat<-matrix(nrow=N,ncol=m)
T_mat<-matrix(nrow=N,ncol=m)
ind_obs<-matrix(nrow=N,ncol=m)

for(i in 1:N){
 ind_obs[i,]<-sort(sample(1:M,m,replace=FALSE))
 X_mat[i,]<-X_comp[i,ind_obs[i,]]
 T_mat[i,]<-grid[ind_obs[i,]]
}

spt<-1
ind_obs[spt,1] = 1; ind_obs[spt,m] = M
X_mat[spt,]<-X_comp[spt,ind_obs[spt,]]
T_mat[spt,]<-grid[ind_obs[spt,]]

## Create data frame for observed data
obsdf <- data.frame("X" = c(t(X_mat)),"argvals" = c(t(T_mat)),
                   "y" = rep(y,each = m),"subj" = rep(1:N,each = m))

misfit_out <- misfit(obsdf,grid = grid,nimps = nimps,J = J)


## End(Not run)

justin-petrovich/sparsefreg documentation built on Aug. 20, 2020, 9:04 p.m.