Nothing
#' Construct a VDFR regression term
#'
#' This function defines the a variable-domain functional regression term
#' for inclusion in an \code{[mgcv]{gam}}-formula (or \code{[mgcv]{bam}} or
#' \code{[mgcv]{gamm}} or \code{gamm4::gamm} as constructed by
#' \code{{pfr}}. These are functional predictors for which each function is
#' observed over a domain of different width.
#' The default is the term \eqn{1/T_i\int_0^{T_i}X_i(t)\beta(t,T_i)dt},
#' where \eqn{X_i(t)} is a functional predictor of length \eqn{T_i} and \eqn{\beta(t,T_i)}
#' is an unknown bivariate coefficient function. Various domain transformations
#' are available, such as lagging or domain-standardizing the coordinates, or
#' parameterizing the interactions; these often result in improved model fit.
#' Basis choice is fully customizable using the options of
#' \code{[mgcv]{s}} and \code{[mgcv]{te}}.
#'
#' @param X matrix containing variable-domain functions. Should be \eqn{N x J},
#' where \eqn{N} is the number of subjects and \eqn{J} is the maximum number of time
#' points per subject. Most rows will have \code{NA} values in the right-most
#' columns, corresponding to unobserved time points.
#' @param argvals indices of evaluation of \code{X}, i.e. \eqn{(t_{i1},.,t_{iJ})} for
#' subject \eqn{i}. May be entered as either a length-\code{J} vector, or as
#' an \code{N} by \code{J} matrix. Indices may be unequally spaced. Entering
#' as a matrix allows for different observations times for each subject.
#' @param vd vector of values of containing the variable-domain width (\eqn{T_i}
#' above). Defaults to the \code{argvals} value corresponding to the last
#' non-\code{NA} element of \eqn{X_i(t)}.
#' @param integration method used for numerical integration. Defaults to \code{"simpson"}'s rule
#' for calculating entries in \code{L}. Alternatively and for non-equidistant grids,
#' \code{"trapezoidal"} or \code{"riemann"}.
#' @param L an optional \code{N} by \code{ncol(argvals)} matrix giving the weights for the numerical
#' integration over \code{t}. If present, overrides \code{integration}.
#' @param basistype character string indicating type of bivariate basis used.
#' Options include \code{"s"} (the default), \code{"te"}, and \code{"t2"},
#' which correspond to \code{mgcv::s}, \code{mgcv::te}, and \code{mgcv::t2}.
#' @param transform character string indicating an optional basis transformation;
#' see Details for options.
#' @param mp for \code{transform=="linear"} or \code{transform=="quadratic"},
#' \code{TRUE} to use multiple penalties for the smooth (one for each marginal
#' basis). If \code{FALSE}, penalties are concatonated into a single
#' block-diagonal penalty matrix (with one smoothing parameter).
#' @param ... optional arguments for basis and penalization to be passed to the
#' function indicated by \code{basistype}. These could include, for example,
#' \code{"bs"}, \code{"k"}, \code{"m"}, etc. See \code{{te}} or
#' \code{{s}} for details.
#'
#' @details The variable-domain functional regression model uses the term
#' \eqn{\frac1{T_i}\int_0^{T_i}X_i(t)\beta(t,T_i)dt} to incorporate a
#' functional predictor with subject-specific domain width. This term imposes
#' a smooth (nonparametric) interaction between \eqn{t} and \eqn{T_i}. The domain
#' of the coefficient function is the triangular (or trapezoidal) surface
#' defined by \eqn{{t,T_i: 0\le t\le T_i}}. The default basis uses
#' bivariate thin-plate regression splines.
#'
#' Different basis transformations can result in different properties; see
#' Gellar, et al. (2014) for a more complete description. We make five basis
#' transformations easily accessible using the \code{transform} argument.
#' This argument is a character string that can take one of the following
#' values:
#' \enumerate{
#' \item \code{"lagged"}: transforms \code{argvals} to \code{argvals - vd}
#' \item \code{"standardized"}: transforms \code{argvals} to \code{argvals/vd},
#' and then rescales \code{vd} linearly so it ranges from 0 to 1
#' \item \code{"linear"}: first transforms the domain as in
#' \code{"standardized"}, then parameterizes the interaction with
#' \code{"vd"} to be linear
#' \item \code{"quadratic"}: first transforms the domain as in
#' \code{"standardized"}, then parameterizes the interaction with
#' \code{"vd"} to be quadratic
#' \item \code{"noInteraction"}: first transforms the domain as in
#' \code{"standardized"}, then reduces the bivariate basis to univariate
#' with no effect of \code{vd}. This would be equivalent to using
#' \code{{lf}} on the domain-standardized predictor functions.
#' }
#'
#' The practical effect of using the \code{"lagged"} basis is to increase
#' smoothness along the right (diagonal) edge of the resultant estimate.
#' The practical effect of using a \code{"standardized"} basis is to allow
#' for greater smoothness at high values of \eqn{T_i} compared to lower
#' values.
#'
#' These basis transformations rely on the basis constructors
#' available in the \code{mgcvTrans} package. For more specific control over
#' the transformations, you can use \code{bs="dt"} and/or \code{bs="pi"};
#' see \code{{smooth.construct.dt.smooth.spec}} or
#' \code{{smooth.construct.pi.smooth.spec}} for an explanation of the
#' options (entered through the \code{xt} argument of \code{lf.vd}/\code{s}).
#'
#' Note that tensor product bases are only recommended when a standardized
#' transformation is used. Without this transformation, just under half of
#' the "knots" used to define the basis will fall outside the range of the
#' data and have no data available to estimate them. The penalty allows
#' the corresponding coefficients to be estimated, but results may be
#' unstable.
#'
#' @return a list with the following entries
#' \item{call}{a \code{call} to \code{s} or \code{te}, using the appropriately constructed
#' weight matrices}
#' \item{data}{data used by the \code{call}, which includes the matrices indicated
#' by \code{argname}, \code{Tindname}, and \code{LXname}}
#' \item{L}{the matrix of weights used for the integration}
#' \item{argname}{the name used for the \code{argvals} variable in the \code{formula}
#' used by \code{mgcv::gam}}
#' \item{Tindname}{the name used for the \code{Tind} variable in the \code{formula}
#' used by \code{mgcv::gam}}
#' \item{LXname}{the name of the \code{by} variable used by \code{s} or \code{te}
#' in the \code{formula} for \code{mgcv::gam}}
#' @export
#' @author Jonathan E. Gellar <JGellar@@mathematica-mpr.com>
#' @references Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and
#' Ciprian M. Crainiceanu. Variable-Domain Functional Regression for Modeling
#' ICU Data. Journal of the American Statistical Association,
#' 109(508):1425-1439, 2014.
#' @examples
#' \dontrun{
#' data(sofa)
#' fit.vd1 <- pfr(death ~ lf.vd(SOFA) + age + los,
#' family="binomial", data=sofa)
#' fit.vd2 <- pfr(death ~ lf.vd(SOFA, transform="lagged") + age + los,
#' family="binomial", data=sofa)
#' fit.vd3 <- pfr(death ~ lf.vd(SOFA, transform="standardized") + age + los,
#' family="binomial", data=sofa)
#' fit.vd4 <- pfr(death ~ lf.vd(SOFA, transform="standardized",
#' basistype="te") + age + los,
#' family="binomial", data=sofa)
#' fit.vd5 <- pfr(death ~ lf.vd(SOFA, transform="linear", bs="ps") + age + los,
#' family="binomial", data=sofa)
#' fit.vd6 <- pfr(death ~ lf.vd(SOFA, transform="quadratic", bs="ps") + age + los,
#' family="binomial", data=sofa)
#' fit.vd7 <- pfr(death ~ lf.vd(SOFA, transform="noInteraction", bs="ps") + age + los,
#' family="binomial", data=sofa)
#'
#' ests <- lapply(1:7, function(i) {
#' c.i <- coef(get(paste0("fit.vd", i)), n=173, n2=173)
#' c.i[(c.i$SOFA.arg <= c.i$SOFA.vd),]
#' })
#'
#' # Try plotting for each i
#' i <- 1
#' lims <- c(-2,8)
#' if (requireNamespace("ggplot2", quietly = TRUE) &
#' requireNamespace("RColorBrewer", quietly = TRUE)) {
#' est <- ests[[i]]
#' est$value[est$value<lims[1]] <- lims[1]
#' est$value[est$value>lims[2]] <- lims[2]
#' ggplot2::ggplot(est, ggplot2::aes(SOFA.arg, SOFA.vd)) +
#' ggplot2::geom_tile(ggplot2::aes(colour=value, fill=value)) +
#' ggplot2::scale_fill_gradientn( name="", limits=lims,
#' colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
#' ggplot2::scale_colour_gradientn(name="", limits=lims,
#' colours=rev(RColorBrewer::brewer.pal(11,"Spectral"))) +
#' ggplot2::scale_y_continuous(expand = c(0,0)) +
#' ggplot2::scale_x_continuous(expand = c(0,0)) +
#' ggplot2::theme_bw()
#' }
#' }
#'
#' @seealso \code{{pfr}}, \code{{lf}}, mgcv's
#' \code{{linear.functional.terms}}.
lf.vd <- function(X, argvals = seq(0, 1, l = ncol(X)), vd=NULL,
integration = c("simpson", "trapezoidal", "riemann"), L=NULL,
basistype=c("s","te","t2"),
transform=NULL, mp=TRUE, ...
) {
integration <- match.arg(integration)
basistype <- match.arg(basistype)
# Set up functions
n = nrow(X)
J = ncol(X)
J.i <- apply(X, 1, function(x) max(which(!is.na(x))))
# Create coordinate matrices
if (is.null(dim(argvals))) {
argvals <- t(argvals)
stopifnot(ncol(argvals) == J)
if (nrow(argvals) == 1) {
argvals <- matrix(as.vector(argvals), nrow = n, ncol = J, byrow = T)
}
stopifnot(nrow(argvals) == n)
}
if (is.null(vd))
# Defaults to the last non-NA value of argvals for that function
vd <- sapply(1:nrow(X), function(i) argvals[i,J.i[i]])
if (is.null(dim(vd))) {
vd <- t(vd)
stopifnot(ncol(vd) == n)
if (nrow(vd) == 1) {
vd <- matrix(as.vector(vd), nrow = n, ncol = J)
}
stopifnot(nrow(vd) == n)
}
# Process Functional Predictor
if (!is.null(L)) {
stopifnot(nrow(L) == n, ncol(L) == J)
} else {
L <- getL(argvals, integration=integration, n.int=J.i)
}
LX <- L*X
# Zero-out unused coordinates. For argvals and vd, this means we set them to
# any other (used) coordinate combination. This ensures they won't affect the
# range of values used to set up the basis.
# For LX, we set the weight to 0.
argvals[is.na(LX)] <- argvals[!is.na(LX)][1]
vd[is.na(LX)] <- vd[!is.na(LX)][1]
LX[is.na(LX)] <- 0
# Term names for basis construction
argname <- paste(deparse(substitute(X)), ".arg", sep = "")
vdname <- paste(deparse(substitute(X)), ".vd", sep = "")
LXname <- paste("L.", deparse(substitute(X)), sep = "")
# Set up transformations
dots <- list(...)
bs0 <- dots$bs
xt0 <- dots$xt
if (!is.null(transform)) {
# Set up dt basis call
dots$bs <- "dt"
if (transform=="lagged") {
dots$xt <- list(tf=list("s-t"))
if (!is.null(bs0)) dots$xt$bs=bs0
if (!is.null(xt0)) dots$xt$xt=xt0
} else if (transform=="standardized") {
dots$xt <- list(tf=list("s/t", "linear01"))
if (!is.null(bs0)) dots$xt$bs=bs0
if (!is.null(xt0)) dots$xt$xt=xt0
} else if (transform=="noInteraction") {
dots$xt <- list(tf="s/t", bs="pi", xt=list(g="none"))
if (!is.null(bs0)) dots$xt$xt$bs=bs0
if (!is.null(xt0)) dots$xt$xt$xt=xt0
} else if (transform=="linear") {
dots$xt <- list(tf="s/t", bs="pi", xt=list(g="linear", mp=mp))
if (!is.null(bs0)) dots$xt$xt$bs=bs0
if (!is.null(xt0)) dots$xt$xt$xt=xt0
} else if (transform=="quadratic") {
dots$xt <- list(tf="s/t", bs="pi", xt=list(g="quadratic", mp=mp))
if (!is.null(bs0)) dots$xt$xt$bs=bs0
if (!is.null(xt0)) dots$xt$xt$xt=xt0
}
if (basistype!="s") {
# dt basis call must go through s to allow bivariate transformations
# (te would split the coordinates up)
dots$xt$basistype <- basistype
basistype <- "s"
}
}
# Set up basis
data <- list(argvals, vd, LX)
names(data) <- c(argname, vdname, LXname)
#splinefun <- as.symbol(basistype)
call <- as.call(c(list(as.symbol(basistype)),
as.symbol(substitute(argname)),
as.symbol(substitute(vdname)),
by=as.symbol(substitute(LXname)),
dots
))
res <- list(call = call, data = data, L = L,
argname = argname, vdname=vdname, LXname = LXname)
return(res)
}
#' @importFrom stats filter
#' @keywords internal
# Get the weight matrix for a linear functional term
getL <- function(tind, integration, n.int=NULL) {
nt <- ncol(tind)
if (is.null(n.int)) {n.int=rep(nt,nrow(tind))}
L <- t(sapply(1:nrow(tind), function(i) {
# L <- t(sapply(1:2, function(i) {
nt.i <- n.int[i]
if (nt.i==1) {
c(1,rep(0,nt-1))
} else {
tind.i <- tind[i,1:nt.i]
L.i <- switch(integration, simpson = {
((tind.i[nt.i] - tind.i[1])/nt.i)/3 * c(1, rep(c(4,
2), length = nt.i - 2), 1)
}, trapezoidal = {
diffs <- diff(tind.i)
if (length(diffs)>1) {
0.5 * c(diffs[1], filter(diffs, filter=c(1,1))[-(nt.i-1)],
diffs[(nt.i-1)])
} else {
rep(0.5*diffs,2)
}
}, riemann = {
diffs <- diff(tind.i)
c(mean(diffs), diffs)
})
c(L.i, rep(0,nt-nt.i))/sum(L.i)
}
}))
L
}
#' SOFA (Sequential Organ Failure Assessment) Data
#'
#' A dataset containing the SOFA scores (Vincent et al, 1996). for 520 patients,
#' hospitalized in the intensive care unit (ICU) with Acute Lung Injury. Daily
#' measurements are available for as long as each one remains in the ICU. This is an
#' example of variable-domain functional data, as described by Gellar et al. (2014).
#'
#' The data was collected as part of the Improving Care of ALI Patients (ICAP)
#' study (Needham et al., 2006). If you use this dataset as an example in
#' written work, please cite the study protocol.
#'
#' @format A data frame with 520 rows (subjects) and 7 variables:
#' \describe{
#' \item{death}{binary indicator that the subject died in the ICU}
#' \item{SOFA}{520 x 173 matrix in variable-domain format (a ragged array).
#' Each column represents an ICU day. Each row contains the SOFA scores for
#' a subject, one per day, for as long as the subject remained in the ICU.
#' The remaining cells of each row are padded with \code{NA}s. SOFA scores
#' range from 0 to 24, increasing with severity of organ failure. Missing
#' values during one's ICU stay have been imputed using LOCF.}
#' \item{SOFA_raw}{Identical to the \code{SOFA} element, except that it contains
#' some missing values during one's hospitalization. These missing values
#' arise when a subject leaves the ICU temporarily, only to be re-admitted.
#' SOFA scores are not monitored outside the ICU.}
#' \item{los}{ICU length of stay, i.e., the number of days the patient remained
#' in the ICU prior to death or final discharge.}
#' \item{age}{Patient age}
#' \item{male}{Binary indicator for male gender}
#' \item{Charlson}{Charlson co-morbidity index, a measure of baseline health
#' status (before hospitalization and ALI).}
#' }
#'
#' @references
#' Vincent, JL, Moreno, R, Takala, J, Willatts, S, De Mendonca, A,
#' Bruining, H, Reinhart, CK, Suter, PM, Thijs, LG (1996). The SOFA ( Sepsis
#' related Organ Failure Assessment) score to describe organ
#' dysfunction/failure. Intensive Care Medicine, 22(7): 707-710.
#'
#' Needham, D. M., Dennison, C. R., Dowdy, D. W., Mendez-Tellez, P. A.,
#' Ciesla, N., Desai, S. V., Sevransky, J., Shanholtz, C., Scharfstein, D.,
#' Herridge, M. S., and Pronovost, P. J. (2006). Study protocol: The
#' Improving Care of Acute Lung Injury Patients (ICAP) study. Critical Care
#' (London, England), 10(1), R9.
#'
#' Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and
#' Ciprian M. Crainiceanu. Variable-Domain Functional Regression for Modeling
#' ICU Data. Journal of the American Statistical Association,
#' 109(508):1425-1439, 2014.
#'
"sofa"
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.