lf.vd: Construct a VDFR regression term
In refunders/refund: Regression with Functional Data

lf.vd

R Documentation

Construct a VDFR regression term

Description

This function defines the a variable-domain functional regression term for inclusion in an [mgcv]{gam}-formula (or [mgcv]{bam} or [mgcv]{gamm} or gamm4::gamm as constructed by {pfr}. These are functional predictors for which each function is observed over a domain of different width. The default is the term 1/T_i\int_0^{T_i}X_i(t)\beta(t,T_i)dt, where X_i(t) is a functional predictor of length T_i and \beta(t,T_i) is an unknown bivariate coefficient function. Various domain transformations are available, such as lagging or domain-standardizing the coordinates, or parameterizing the interactions; these often result in improved model fit. Basis choice is fully customizable using the options of [mgcv]{s} and [mgcv]{te}.

Usage

lf.vd(
  X,
  argvals = seq(0, 1, l = ncol(X)),
  vd = NULL,
  integration = c("simpson", "trapezoidal", "riemann"),
  L = NULL,
  basistype = c("s", "te", "t2"),
  transform = NULL,
  mp = TRUE,
  ...
)

Arguments

`X`	matrix containing variable-domain functions. Should be `N x J`, where `N` is the number of subjects and `J` is the maximum number of time points per subject. Most rows will have `NA` values in the right-most columns, corresponding to unobserved time points.
`argvals`	indices of evaluation of `X`, i.e. `(t_{i1},.,t_{iJ})` for subject `i`. May be entered as either a length-`J` vector, or as an `N` by `J` matrix. Indices may be unequally spaced. Entering as a matrix allows for different observations times for each subject.
`vd`	vector of values of containing the variable-domain width (`T_i` above). Defaults to the `argvals` value corresponding to the last non-`NA` element of `X_i(t)`.
`integration`	method used for numerical integration. Defaults to `"simpson"`'s rule for calculating entries in `L`. Alternatively and for non-equidistant grids, `"trapezoidal"` or `"riemann"`.
`L`	an optional `N` by `ncol(argvals)` matrix giving the weights for the numerical integration over `t`. If present, overrides `integration`.
`basistype`	character string indicating type of bivariate basis used. Options include `"s"` (the default), `"te"`, and `"t2"`, which correspond to `mgcv::s`, `mgcv::te`, and `mgcv::t2`.
`transform`	character string indicating an optional basis transformation; see Details for options.
`mp`	for `transform=="linear"` or `transform=="quadratic"`, `TRUE` to use multiple penalties for the smooth (one for each marginal basis). If `FALSE`, penalties are concatonated into a single block-diagonal penalty matrix (with one smoothing parameter).
`...`	optional arguments for basis and penalization to be passed to the function indicated by `basistype`. These could include, for example, `"bs"`, `"k"`, `"m"`, etc. See `{te}` or `{s}` for details.

Details

The variable-domain functional regression model uses the term \frac1{T_i}\int_0^{T_i}X_i(t)\beta(t,T_i)dt to incorporate a functional predictor with subject-specific domain width. This term imposes a smooth (nonparametric) interaction between t and T_i. The domain of the coefficient function is the triangular (or trapezoidal) surface defined by {t,T_i: 0\le t\le T_i}. The default basis uses bivariate thin-plate regression splines.

Different basis transformations can result in different properties; see Gellar, et al. (2014) for a more complete description. We make five basis transformations easily accessible using the transform argument. This argument is a character string that can take one of the following values:

"lagged": transforms argvals to argvals - vd
"standardized": transforms argvals to argvals/vd, and then rescales vd linearly so it ranges from 0 to 1
"linear": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be linear
"quadratic": first transforms the domain as in "standardized", then parameterizes the interaction with "vd" to be quadratic
"noInteraction": first transforms the domain as in "standardized", then reduces the bivariate basis to univariate with no effect of vd. This would be equivalent to using {lf} on the domain-standardized predictor functions.

The practical effect of using the "lagged" basis is to increase smoothness along the right (diagonal) edge of the resultant estimate. The practical effect of using a "standardized" basis is to allow for greater smoothness at high values of T_i compared to lower values.

These basis transformations rely on the basis constructors available in the mgcvTrans package. For more specific control over the transformations, you can use bs="dt" and/or bs="pi"; see {smooth.construct.dt.smooth.spec} or {smooth.construct.pi.smooth.spec} for an explanation of the options (entered through the xt argument of lf.vd/s).

Note that tensor product bases are only recommended when a standardized transformation is used. Without this transformation, just under half of the "knots" used to define the basis will fall outside the range of the data and have no data available to estimate them. The penalty allows the corresponding coefficients to be estimated, but results may be unstable.

Value

a list with the following entries

`call`	a `call` to `s` or `te`, using the appropriately constructed weight matrices
`data`	data used by the `call`, which includes the matrices indicated by `argname`, `Tindname`, and `LXname`
`L`	the matrix of weights used for the integration
`argname`	the name used for the `argvals` variable in the `formula` used by `mgcv::gam`
`Tindname`	the name used for the `Tind` variable in the `formula` used by `mgcv::gam`
`LXname`	the name of the `by` variable used by `s` or `te` in the `formula` for `mgcv::gam`

Author(s)

Jonathan E. Gellar <JGellar@mathematica-mpr.com>

References

Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and Ciprian M. Crainiceanu. Variable-Domain Functional Regression for Modeling ICU Data. Journal of the American Statistical Association, 109(508):1425-1439, 2014.

Examples

## Not run: 
  data(sofa)
  fit.vd1 <- pfr(death ~ lf.vd(SOFA) + age + los,
                 family="binomial", data=sofa)
  fit.vd2 <- pfr(death ~ lf.vd(SOFA, transform="lagged") + age + los,
                 family="binomial", data=sofa)
  fit.vd3 <- pfr(death ~ lf.vd(SOFA, transform="standardized") + age + los,
                 family="binomial", data=sofa)
  fit.vd4 <- pfr(death ~ lf.vd(SOFA, transform="standardized",
                               basistype="te") + age + los,
                 family="binomial", data=sofa)
  fit.vd5 <- pfr(death ~ lf.vd(SOFA, transform="linear", bs="ps") + age + los,
                 family="binomial", data=sofa)
  fit.vd6 <- pfr(death ~ lf.vd(SOFA, transform="quadratic", bs="ps") + age + los,
                 family="binomial", data=sofa)
  fit.vd7 <- pfr(death ~ lf.vd(SOFA, transform="noInteraction", bs="ps") + age + los,
                 family="binomial", data=sofa)

  ests <- lapply(1:7, function(i) {
    c.i <- coef(get(paste0("fit.vd", i)), n=173, n2=173)
    c.i[(c.i$SOFA.arg <= c.i$SOFA.vd),]
  })

  # Try plotting for each i
  i <- 1
  lims <- c(-2,8)
  if (requireNamespace("ggplot2", quietly = TRUE) &
      requireNamespace("RColorBrewer", quietly = TRUE)) {
        est <- ests[[i]]
        est$value[est$value<lims[1]] <- lims[1]
        est$value[est$value>lims[2]] <- lims[2]
        ggplot2::ggplot(est, ggplot2::aes(SOFA.arg, SOFA.vd)) +
          ggplot2::geom_tile(ggplot2::aes(colour=value, fill=value)) +
          ggplot2::scale_fill_gradientn(  name="", limits=lims,
                    colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
          ggplot2::scale_colour_gradientn(name="", limits=lims,
                    colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
          ggplot2::scale_y_continuous(expand = c(0,0)) +
          ggplot2::scale_x_continuous(expand = c(0,0)) +
          ggplot2::theme_bw()
  }

## End(Not run)

refunders/refund documentation built on Sept. 25, 2024, 5:46 p.m.