Description Usage Arguments Details References Examples
Performs MISFIT for either linear (family="gaussian"
) or logistic
(family="binomial"
) regression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
dat |
A data frame with n rows (where N is the number of subjects,
each with m_i observations, so that ∑_{i=1}^N m_i = n)
expected to have either 3 or 4. If |
grid |
A length M vector of the unique desired grid points on which to evaluate the function. |
nimps |
An integer specifying the number of desired imputations, if |
J |
An integer specifying the number of FPCs to include in the regression model. By default (NULL), J will be chosen as the minimum number of FPCs required to explain a given percentage of variance. |
pve |
The desired percentage of variance to be explained by the FPCs.
Only used if |
family |
A string indicating the family of the response variable. Currently only "gaussian" (linear regression) and "binomial" (logistic regression) are supported. |
impute_type |
A string indicating whether to use mean or multiple imputation. Only accepts "Mean" or "Multiple". Defaults to "Multiple". |
cond.y |
A boolean indicating whehter to condition on the response variable when imputing. Defaults to TRUE. |
seed |
An integer used to specify the seed. Optional, but useful for making results reproducible in the Multiple Imputation step. |
user_params |
An optional list of user-defined imputation parameters. Currently, the user must provide either all necessary imputation parameters, or none. See 'Details'. |
use_fcr |
A boolean indicating whether to use |
k |
Dimension of the smooth terms used in |
fcr.args |
A list of arguments which can be passed to |
face.args |
A list of arguments to be passed to the underlying function |
When using the user_params
argument, the user must supply a list containing the
following elements.
Linear Regression:
'Cx': An M\times M matrix representing the covariance function of X(t),
evaluated on grid
. Should not be missing any values.
'mux': A length M numeric vector representing the mean function of X(t),
evaluated on grid
. Should not be missing any values.
'var_delt': A single numeric value representing the variance of δ, the measurement error associated with X(t).
'muy': A single numeric value representing the mean of Y.
'lam': A numeric vector of length at least J
, representing the eigenvalues
of C_X(t,s), the covariance function of X(t).
'phi': A matrix with M rows and at least J
columns, representing the
eigenfunctions of C_X(t,s) (one per column) evaluated on grid
. Should not be missing
any values.
'Cxy': A numeric vector of length M, representing the cross-covariance C_{XY}(t)
evaluated on grid
. Should not be missing any values.
'var_y': A single numeric value representing the varinace of Y.
Logistic Regression:
'Cx': An M\times M matrix representing the covariance function of X(t),
evaluated on grid
. Should not be missing any values.
'mu0': A length M numeric vector representing the mean function of X(t)|Y = 1,
evaluated on grid
. Should not be missing any values.
'mu1': A length M numeric vector representing the mean function of X(t)|Y = 0,
evaluated on grid
. Should not be missing any values.
'var_delt': A single numeric value representing the variance of δ, the measurement error associated with X(t).
'lam': A numeric vector of length at least J
, representing the eigenvalues
of C_X(t,s), the covariance function of X(t).
'phi': A matrix with M rows and at least J
columns, representing the
eigenfunctions of C_X(t,s) (one per column) evaluated on grid
. Should not be missing
any values.
By default, use_fcr is TRUE, meaning that fcr is used to estimate imputation parameters. Using FPCA (i.e. use_fcr = FALSE) is roughly 10 times faster, at least for small to moderate data sets. For a single use of the function, this difference is not meaningful as both methods complete in under a minute. But when performing simulations, this speed difference is significant. More testing is needed to determine which method more accuartely estimates the imputation parameters. See 'References' below for details on the methods used in fcr and FPCA.
Leroux, A., Xiao, L., Crainiceanu, C., & Checkley, W. (2018). Dynamic prediction in functional concurrent regression with an application to child growth. Statistics in medicine, 37(8), 1376-1388.
Yao, Fang, Hans-Georg Mueller, and Jane-Ling Wang. "Functional data analysis for sparse longitudinal data." Journal of the American Statistical Association 100, no. 470 (2005): 577-590.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | ## Not run:
###################################################################
#------- Example Using MISFIT for a Linear SoF Model -------------#
###################################################################
set.seed(123)
## Data generation
M <- 100 # grid size
N <- 400 # sample size
m <- 2 # observations per subject
J <- 5 # number of FPCs to use
nimps <- 10 # number of imputations
var_eps <- 1 # variance of model error
var_delt <- 0.5 # variance of measurement error
grid <- seq(from=0,to=1,length.out = M)
mux <- rep(0,M)
Cx_f<-function(t,s,sig2=1,rho=0.5){ # Matern covariance function with nu = 5/2
d <- abs(outer(t,s,"-"))
tmp2 <- sig2*(1+sqrt(5)*d/rho + 5*d^2/(3*rho^2))*exp(-sqrt(5)*d/rho)}
Cx <- Cx_f(grid,grid)
lam <- eigen(Cx,symmetric = T)$values/M
phi <- eigen(Cx,symmetric = T)$vectors*sqrt(M)
beta <- 10*(sin(2*pi*grid)+1)
alpha <- 0
X_s <- mvrnorm(N,mux,Cx)
X_comp <- X_s + rnorm(N*M,sd = sqrt(var_delt))
Xi <- (X_s-mux)%*%phi/M
eps <- rnorm(N,0,sd = sqrt(var_eps))
y <- c(alpha + X_s%*%beta/M + eps)
X_mat<-matrix(nrow=N,ncol=m)
T_mat<-matrix(nrow=N,ncol=m)
ind_obs<-matrix(nrow=N,ncol=m)
for(i in 1:N){
ind_obs[i,]<-sort(sample(1:M,m,replace=FALSE))
X_mat[i,]<-X_comp[i,ind_obs[i,]]
T_mat[i,]<-grid[ind_obs[i,]]
}
spt<-1
ind_obs[spt,1] = 1; ind_obs[spt,m] = M
X_mat[spt,]<-X_comp[spt,ind_obs[spt,]]
T_mat[spt,]<-grid[ind_obs[spt,]]
## Create data frame for observed data
obsdf <- data.frame("X" = c(t(X_mat)),"argvals" = c(t(T_mat)),
"y" = rep(y,each = m),"subj" = rep(1:N,each = m))
misfit_out <- misfit(obsdf,grid = grid,nimps = nimps,J = J)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.