# lf.vd: Construct a VDFR regression term In refund: Regression with Functional Data

## Description

This function defines the a variable-domain functional regression term for inclusion in an gam-formula (or bam or gamm or gamm4::gamm as constructed by pfr. These are functional predictors for which each function is observed over a domain of different width. The default is the term 1/T_i\int_0^{T_i}X_i(t)β(t,T_i)dt, where X_i(t) is a functional predictor of length T_i and β(t,T_i) is an unknown bivariate coefficient function. Various domain transformations are available, such as lagging or domain-standardizing the coordinates, or parameterizing the interactions; these often result in improved model fit. Basis choice is fully customizable using the options of s and te.

## Usage

 1 2 3 4 5 6 7 8 9 10 11 lf.vd( X, argvals = seq(0, 1, l = ncol(X)), vd = NULL, integration = c("simpson", "trapezoidal", "riemann"), L = NULL, basistype = c("s", "te", "t2"), transform = NULL, mp = TRUE, ... )

## Arguments

 X matrix containing variable-domain functions. Should be N x J, where N is the number of subjects and J is the maximum number of time points per subject. Most rows will have NA values in the right-most columns, corresponding to unobserved time points. argvals indices of evaluation of X, i.e. (t_{i1},.,t_{iJ}) for subject i. May be entered as either a length-J vector, or as an N by J matrix. Indices may be unequally spaced. Entering as a matrix allows for different observations times for each subject. vd vector of values of containing the variable-domain width (T_i above). Defaults to the argvals value corresponding to the last non-NA element of X_i(t). integration method used for numerical integration. Defaults to "simpson"'s rule for calculating entries in L. Alternatively and for non-equidistant grids, "trapezoidal" or "riemann". L an optional N by ncol(argvals) matrix giving the weights for the numerical integration over t. If present, overrides integration. basistype character string indicating type of bivariate basis used. Options include "s" (the default), "te", and "t2", which correspond to mgcv::s, mgcv::te, and mgcv::t2. transform character string indicating an optional basis transformation; see Details for options. mp for transform=="linear" or transform=="quadratic", TRUE to use multiple penalties for the smooth (one for each marginal basis). If FALSE, penalties are concatonated into a single block-diagonal penalty matrix (with one smoothing parameter). ... optional arguments for basis and penalization to be passed to the function indicated by basistype. These could include, for example, "bs", "k", "m", etc. See te or s for details.

## Details

The variable-domain functional regression model uses the term \frac1{T_i}\int_0^{T_i}X_i(t)β(t,T_i)dt to incorporate a functional predictor with subject-specific domain width. This term imposes a smooth (nonparametric) interaction between t and T_i. The domain of the coefficient function is the triangular (or trapezoidal) surface defined by {t,T_i: 0≤ t≤ T_i}. The default basis uses bivariate thin-plate regression splines.

Different basis transformations can result in different properties; see Gellar, et al. (2014) for a more complete description. We make five basis transformations easily accessible using the transform argument. This argument is a character string that can take one of the following values:

1. "lagged": transforms argvals to argvals - vd

2. "standardized": transforms argvals to argvals/vd, and then rescales vd linearly so it ranges from 0 to 1

3. "linear": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be linear

4. "quadratic": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be quadratic

5. "noInteraction": first transforms the domain as in "standardized", then reduces the bivariate basis to univariate with no effect of vd. This would be equivalent to using lf on the domain-standardized predictor functions.

The practical effect of using the "lagged" basis is to increase smoothness along the right (diagonal) edge of the resultant estimate. The practical effect of using a "standardized" basis is to allow for greater smoothness at high values of T_i compared to lower values.

These basis transformations rely on the basis constructors available in the mgcvTrans package. For more specific control over the transformations, you can use bs="dt" and/or bs="pi"; see smooth.construct.dt.smooth.spec or smooth.construct.pi.smooth.spec for an explanation of the options (entered through the xt argument of lf.vd/s).

Note that tensor product bases are only recommended when a standardized transformation is used. Without this transformation, just under half of the "knots" used to define the basis will fall outside the range of the data and have no data available to estimate them. The penalty allows the corresponding coefficients to be estimated, but results may be unstable.

## Value

a list with the following entries

 call a call to s or te, using the appropriately constructed weight matrices data data used by the call, which includes the matrices indicated by argname, Tindname, and LXname L the matrix of weights used for the integration argname the name used for the argvals variable in the formula used by mgcv::gam Tindname the name used for the Tind variable in the formula used by mgcv::gam LXname the name of the by variable used by s or te in the formula for mgcv::gam

## Author(s)

Jonathan E. Gellar <JGellar@mathematica-mpr.com>

## References

Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and Ciprian M. Crainiceanu. Variable-Domain Functional Regression for Modeling ICU Data. Journal of the American Statistical Association, 109(508):1425-1439, 2014.

pfr, lf, mgcv's linear.functional.terms.

## Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 ## Not run: data(sofa) fit.vd1 <- pfr(death ~ lf.vd(SOFA) + age + los, family="binomial", data=sofa) fit.vd2 <- pfr(death ~ lf.vd(SOFA, transform="lagged") + age + los, family="binomial", data=sofa) fit.vd3 <- pfr(death ~ lf.vd(SOFA, transform="standardized") + age + los, family="binomial", data=sofa) fit.vd4 <- pfr(death ~ lf.vd(SOFA, transform="standardized", basistype="te") + age + los, family="binomial", data=sofa) fit.vd5 <- pfr(death ~ lf.vd(SOFA, transform="linear", bs="ps") + age + los, family="binomial", data=sofa) fit.vd6 <- pfr(death ~ lf.vd(SOFA, transform="quadratic", bs="ps") + age + los, family="binomial", data=sofa) fit.vd7 <- pfr(death ~ lf.vd(SOFA, transform="noInteraction", bs="ps") + age + los, family="binomial", data=sofa) ests <- lapply(1:7, function(i) { c.i <- coef(get(paste0("fit.vd", i)), n=173, n2=173) c.i[(c.i$SOFA.arg <= c.i$SOFA.vd),] }) # Try plotting for each i i <- 1 lims <- c(-2,8) if (requireNamespace("ggplot2", quietly = TRUE) & requireNamespace("RColorBrewer", quietly = TRUE)) { est <- ests[[i]] est$value[est$valuelims[2]] <- lims[2] ggplot2::ggplot(est, ggplot2::aes(SOFA.arg, SOFA.vd)) + ggplot2::geom_tile(ggplot2::aes(colour=value, fill=value)) + ggplot2::scale_fill_gradientn( name="", limits=lims, colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) + ggplot2::scale_colour_gradientn(name="", limits=lims, colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) + ggplot2::scale_y_continuous(expand = c(0,0)) + ggplot2::scale_x_continuous(expand = c(0,0)) + ggplot2::theme_bw() } ## End(Not run)

refund documentation built on July 1, 2021, 9:06 a.m.