Regression Analysis for Time-Invariant Coefficients Using Half-Kernel Estimation

Description

Estimation of regression models for sparse asynchronous longitudinal observations using a half-kernel estimation approach with time-invariant coefficients.

Usage

1
2
asynchHK(data.x, data.y, kType = "epan", lType = "identity", bw = NULL, 
         nCores = 1, ...)

Arguments

data.x

A data.frame of covariates. The structure of the data.frame must be {patient ID, time of measurement, measurement(s)}. Patient IDs must be of class integer or be able to be coerced to class integer without loss of information. Missing values must be indicated as NA. All times will automatically be rescaled to [0,1].

data.y

A data.frame of response measurements. The structure of the data.frame must be {patient ID, time of measurement, measurement}. Patient IDs must be of class integer or be able to be coerced to class integer without loss of information. Missing values must be indicated as NA. All times will automatically be rescaled to [0,1].

kType

An object of class character indicating the type of smoothing kernel to use in the estimating equation. Must be one of {"epan", "uniform", "gauss"}, where "epan" is the Epanechnikov kernel and "gauss" is the Gaussian kernel.

lType

An object of class character indicating the type of link function to use for the regression model. Must be one of {"identity","log","logistic"}.

bw

If provided, bw is an object of class numeric or a numeric vector containing the bandwidths for which parameter estimates are to be obtained. If NULL, an optimal bandwidth will be determined using an adaptive selection procedure. The range of the bandwidth search space is taken to be 2*(Q3 - Q1)*n^-0.7 to 2*(Q3 - Q1)*n^-0.3, where Q3 is the 0.75 quantile and Q1 is the 0.25 quantile of the pooled sample of measurement times for the covariate and response, and n is the number of patients. See original reference for details of the selection procedure.

nCores

A numeric object. For auto-tune method, the number of cores to employ for calculation. If nCores > 1, the bandwidth search space will be distributed across the cores using parallel's parLapply.

...

Ignored.

Details

For lType = "log" and lType = "logistic", parameter estimates are obtained by minimizing the estimating equation using optim() with method="Nelder-Mead"; all other arguments take their default values.

For lType = "identity", parameter estimates are obtained using solve().

Value

A list is returned. If bandwidths are provided, each element of the list is a matrix, where the ith row corresponds to the ith bandwidth of argument “bw" and the columns correspond to the model parameters. If the bandwidth is determined automatically, each element is a named vector calculated at the optimal bandwidth.

betaHat

The estimated model coefficients.

stdErr

The standard error for each coefficient.

zValue

The estimated z-value for each coefficient.

pValue

The p-value for each coefficient.

If the bandwidth is determined automatically, two additional list elements are returned:

optBW

The estimated optimal bandwidth for each coefficient.

minMSE

The mean squared error at the optimal bandwidth for each coefficient.

Author(s)

Hongyuan Cao, Jialiang Li, Jason P. Fine, and Shannon T. Holloway

References

Cao, H., Li, Jialiang, and Fine, J. P. (2015). On last observation carried forward and asynchronous longitudinal regression analysis. Electronic Journal of Statistics, submitted.

Examples

1
2
3
4
5
6
7
  data(asynchDataTI)

  res <- asynchHK(data.x = TI.x, 
                  data.y = TI.y,
                  bw = c(0.05, 0.03),
                  kType = "epan", 
                  lType = "identity")