wcr: Principal component regression and partial least squares in...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/wcr.R

Description

Performs generalized linear scalar-on-function or scalar-on-image regression in the wavelet domain, by sparse principal component regression (PCR) and sparse partial least squares (PLS).

Usage

1
2
3
4
wcr(y, xfuncs, min.scale, nfeatures, ncomp, method = c("pcr", "pls"), 
    mean.signal.term = FALSE, covt = NULL, filter.number = 10, 
    wavelet.family = "DaubLeAsymm", family = "gaussian", cv1 = FALSE, nfold = 5, 
    nsplit = 1, store.cv = FALSE, store.glm = FALSE, seed = NULL)

Arguments

y

scalar outcome vector.

xfuncs

functional predictors. For 1D predictors, an n \times d matrix of signals, where n is the length of y and d is the number of sites at which each signal is defined. For 2D predictors, an n \times d \times d array comprising n images of dimension d \times d. For 3D predictors, an n \times d \times d \times d array comprising n images of dimension d \times d \times d. Note that d must be a power of 2.

min.scale

either a scalar, or a vector of values to be compared. Used to control the coarseness level of wavelet decomposition. Possible values are 0,1,…,log_2(d) - 1.

nfeatures

number(s) of features, i.e. wavelet coefficients, to retain for prediction of y: either a scalar, or a vector of values to be compared.

ncomp

number(s) of principal components (if method="pcr") or PLS components (if method="pls"): either a scalar, or a vector of values to be compared.

method

either "pcr" (principal component regression) (the default) or "pls" (partial least squares).

mean.signal.term

logical: should the mean of each subject's signal be included as a covariate? By default, FALSE.

covt

covariates, if any: an n-row matrix, or a vector of length n.

filter.number

argument passed to function wd, imwd, or wd3D in the wavethresh package. Used to select the smoothness of wavelet in the decomposition.

wavelet.family

family of wavelets: passed to functions wd, imwd, orwd3D.

family

generalized linear model family. Current version supports "gaussian" (the default) and "binomial".

cv1

logical: should cross-validation be performed (to estimate prediction error) even if a single value is provided for each of min.scale, nfeatures and ncomp? By default, FALSE. Note that whenever multiple candidate values are provided for one or more of these tuning parameters, CV is performed to select the best model.

nfold

the number of validation sets ("folds") into which the data are divided.

nsplit

number of splits into nfold validation sets; CV is computed by averaging over these splits.

store.cv

logical: should the output include a CV result table?

store.glm

logical: should the output include the fitted glm?

seed

the seed for random data division. If seed = NULL, a random seed is used.

Details

Briefly, the algorithm works by (1) applying the discrete wavelet transform (DWT) to the functional/image predictors; (2) retaining only the nfeatures wavelet coefficients having the highest variance (for PCR; cf. Johnstone and Lu, 2009) or highest covariance with y (for PLS); (3) regressing y on the leading ncomp PCs or PLS components, along with any scalar covariates; and (4) applying the inverse DWT to the result to obtain the coefficient function estimate fhat.

This function supports only the standard DWT (see argument type in wd) with periodic boundary handling (see argument bc in wd).

For 2D predictors, setting min.scale=1 will lead to an error, due to a technical detail regarding imwd. Please contact the author if a workaround is needed.

See the Details for fpcr in refund for a note regarding decorrelation.

Value

An object of class "wcr". This is a list that, if store.glm = TRUE, includes all components of the fitted glm object. The following components are included even if store.glm = FALSE:

fitted.values

the fitted values.

param.coef

coefficients for covariates with decorrelation. The model is fitted after decorrelating the functional predictors from any scalar covariates; but for CV, one needs the "undecorrelated" coefficients from the training-set models.

undecor.coef

coefficients for covariates without decorrelation. See param.coef.

fhat

coefficient function estimate.

Rsq

coefficient of determination.

tuning.params

if CV is performed, a 2 \times 4 table giving the indices and values of min.scale, nfeatures and ncomp chosen by CV.

cv.table

a table giving the CV criterion for each combination of min.scale, nfeatures and ncomp, if store.cv = TRUE; otherwise, the CV criterion only for the optimized combination of these parameters. Set to NULL if CV is not performed.

se.cv

if store.cv = TRUE, the standard error of the CV estimate for each combination of min.scale, nfeatures and ncomp.

family

generalized linear model family.

Author(s)

Lan Huo lan.huo@nyumc.org

References

Johnstone, I. M., and Lu, Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104, 682–693.

See Also

wnet

Examples

1
# See example for wnet

refund.wave documentation built on May 2, 2019, 5:54 p.m.