Nothing
#' Generalized linear models for cumulative incidence
#'
#' Using pseudo observations for the cumulative incidence, this function then
#' runs a generalized linear model and estimates the parameters representing
#' contrasts in the cumulative incidence at a particular set of times (specified
#' by the \code{time} argument) across covariate values. The link function can
#' be "identity" for estimating differences in the cumulative incidence, "log"
#' for estimating ratios, and any of the other link functions supported by
#' \link[stats]{quasi}.
#'
#' @return A pseudoglm object, with its own methods for print, summary, and
#' vcov. It inherits from glm, so predict and other glm methods are supported.
#'
#' @param formula A formula specifying the model. The left hand side must be a
#' \link[survival]{Surv} object specifying a right censored survival or
#' competing risks outcome. The status indicator, normally 0=alive, 1=dead.
#' Other choices are TRUE/FALSE (TRUE = death) or 1/2 (2=death). For competing
#' risks, the event variable will be a factor, whose first level is treated as
#' censoring. The right hand side is the usual linear combination of
#' covariates. If there are multiple time points, the special term "tve(.)"
#' can be used to specify that the effect of the variable inside the
#' parentheses will be time varying. In the output this will be represented
#' as the interaction between the time points and the variable.
#' @param time Numeric vector specifying the times at which the cumulative
#' incidence or survival probability effect estimates are desired.
#' @param cause Numeric or character constant specifying the cause indicator of
#' interest.
#' @param link Link function for the cumulative incidence regression model.
#' @param model.censoring Type of model for the censoring distribution. Options
#' are "stratified", which computes the pseudo-observations stratified on a
#' set of categorical covariates, "aareg" for Aalen's additive hazards model,
#' and "coxph" for Cox's proportional hazards model. With those options, we
#' assume that the time to event and event indicator are conditionally
#' independent of the censoring time, and that the censoring model is
#' correctly specified. If "independent", we assume completely independent
#' censoring, i.e., that the time to event and covariates are independent of
#' the censoring time. the censoring time is independent of the covariates in
#' the model. Can also be a custom function, see Details and the "Extending
#' eventglm" vignette.
#' @param formula.censoring A one sided formula (e.g., \code{~ x1 + x2})
#' specifying the model for the censoring distribution. If NULL, uses the same
#' mean model as for the outcome. Missing values in any covariates for the
#' censoring model will cause an error.
#' @param ipcw.method Which method to use for calculation of inverse probability
#' of censoring weighted pseudo observations. "binder" the default, uses the
#' number of observations as the denominator, while the "hajek" method uses
#' the sum of the weights as the denominator.
#' @param data Data frame in which all variables of formula can be interpreted.
#' @param survival Set to TRUE to use survival (one minus the cumulative
#' incidence) as the outcome. Not available for competing risks models.
#' @param weights an optional vector of 'prior weights' to be used in the
#' fitting process. Should be NULL or a numeric vector found in data.
#' @param subset an optional vector specifying a subset of observations to be
#' used in the fitting process.
#' @param na.action a function which indicates what should happen when the data
#' contain \code{NA}s. The default is set by the \code{na.action} setting of
#' \link[base]{options}, and is \link[stats]{na.fail} if that is unset. The
#' 'factory-fresh' default is \link[stats]{na.omit}. Another possible value is
#' NULL, no action. Value \link[stats]{na.exclude} can be useful.
#' @param offset this can be used to specify an a priori known component to be
#' included in the linear predictor during fitting. This should be NULL or a
#' numeric vector of length equal to the number of cases. One or more
#' \link[stats]{offset} terms can be included in the formula instead or as
#' well, and if more than one is specified their sum is used. See
#' \link[stats]{model.offset}. If length(time) > 1, then any offset terms must
#' appear in the formula.
#' @param control a list of parameters for controlling the fitting process. This
#' is passed to \link[stats]{glm.control}.
#' @param model a logical value indicating whether model frame should be
#' included as a component of the returned value.
#' @param x logical value indicating whether the model matrix used in the
#' fitting process should be returned as components of the returned value.
#' @param y logical value indicating whether the response vector
#' (pseudo-observations) used in the fitting process should be returned as
#' components of the returned value.
#' @param singular.ok logical; if FALSE a singular fit is an error.
#' @param contrasts an optional list. See the contrasts.arg of
#' \link[stats]{model.matrix.default}.
#' @param id An optional integer vector of subject identifiers, found in data.
#' If this is present, then generalized estimating equations will be used
#' to fit the model. This can be used, for example, if there are multiple
#' observations per individual represented as multiple rows in data.
#' @param ... Other arguments passed to \link[stats]{glm.fit}
#'
#'
#' @details The argument "model.censoring" determines how the pseudo
#' observations are calculated. This can be the name of a function or the
#' function itself, which must have arguments "formula", "time", "cause",
#' "data", "type", "formula.censoring", and "ipcw.method". If it is the name
#' of a function, this code will look for a function with the prefix "pseudo_"
#' first, to avoid clashes with related methods such as coxph. The function
#' then must return a vector of pseudo observations, one for each subject in
#' data which are used in subsequent calculations. For examples of the
#' implementation, see the "pseudo-modules.R" file, or the vignette "Extending
#' eventglm".
#'
#'
#' @export
#'
#' @examples
#' cumincipcw <- cumincglm(Surv(etime, event) ~ age + sex,
#' time = 200, cause = "pcm", link = "identity",
#' model.censoring = "independent", data = mgus2)
#' # stratified on only the categorical covariate
#' cumincipcw2 <- cumincglm(Surv(etime, event) ~ age + sex,
#' time = 200, cause = "pcm", link = "identity",
#' model.censoring = "stratified",
#' formula.censoring = ~ sex, data = mgus2)
#' # multiple time points
#' cuminct2 <- cumincglm(Surv(etime, event) ~ age + sex,
#' time = c(50, 100, 200), cause = "pcm", link = "identity",
#' model.censoring = "independent", data = mgus2)
#' cuminct3 <- cumincglm(Surv(etime, event) ~ age + tve(sex),
#' time = c(50, 100, 200), cause = "pcm", link = "identity",
#' model.censoring = "independent", data = mgus2)
cumincglm <- function(formula, time, cause = 1, link = "identity",
model.censoring = "independent", formula.censoring = NULL,
ipcw.method = "binder",
data,
survival = FALSE,
weights, subset,
na.action, offset,
control = list(...), model = FALSE,
x = TRUE, y = TRUE, singular.ok = TRUE, contrasts = NULL,
id, ...) {
stopifnot(is.numeric(time))
cal <- match.call()
mr <- model.response(model.frame(update.formula(formula, . ~ 1), data = data))
newdata <- do.call(rbind, lapply(1:length(time), function(i) data))
matcau <- match_cause(mr, cause)
causec <- matcau$causec
causen <- matcau$causen
otype <- if(survival) "survival" else "cuminc"
pseudo_function <- check_mod_cens(model.censoring)
POi <- lapply(time, function(tt) {
pseudo_function(formula = formula, time = tt, cause = cause,
data = data, type = otype, formula.censoring = formula.censoring,
ipcw.method = ipcw.method)
})
ipcw.weights <- do.call(cbind, lapply(POi, function(pp) {
if(is.null(attr(pp, "ipcw.weights"))) {
rep(1, length(pp))
} else {
attr(pp, "ipcw.weights")
}
}))
POi <- unlist(POi)
nn <- length(POi) / length(time)
oldnames <- names(newdata)
newnames <- make.unique(c(oldnames, "pseudo.vals", "pseudo.time", "pseudo.id"))
po.nme <- newnames[length(newnames) - 2]
pot.nme <- newnames[length(newnames) - 1]
po.id <- newnames[length(newnames)]
newdata[[po.nme]] <- c(POi)
newdata[[pot.nme]] <- rep(time, each = nn)
newdata[[newnames[length(newnames)]]] <- rep(1:nrow(data), length(time))
## get stuff ready for glm.fit or geese.fit
if(length(time) > 1) {
formula2 <- update.formula(formula, as.formula(paste0(po.nme,
"~ factor(", pot.nme, ") + .")))
## special terms
Terms <- terms(formula2, specials = c("tve"))
termvect <- rownames(attr(Terms, "factors"))
tochange <- termvect[attr(Terms, "specials")$tve]
changed <- gsub("(tve\\()(.*)(\\))", paste0("\\2 : factor(", pot.nme, ")"), tochange)
termvect[attr(Terms, "specials")$tve] <- changed
formula2[[3]] <- reformulate(termvect[-1], response = termvect[1])[[3]]
formula2i <- reformulate(termvect[-1], response = termvect[1],
env = environment(formula2))
} else {
formula2 <- update.formula(formula, as.formula(paste(po.nme, "~ .")))
formula2i <- formula2
Terms <- terms(formula2, specials = c("tve"))
if(!is.null(attr(Terms, "specials")$tve)) {
stop("Special term 'tve' not available if length(time) == 1")
}
}
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset",
"weights", "na.action", "offset", "id"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf[["formula"]] <- formula2i
mf[["data"]] <- newdata
mf[[po.id]] <- newdata[[po.id]]
mf <- eval(mf, parent.frame())
mt <- attr(mf, "terms")
control <- do.call("glm.control", control)
Y <- model.response(mf, "any")
if (length(dim(Y)) == 1L) {
nm <- rownames(Y)
dim(Y) <- NULL
if (!is.null(nm))
names(Y) <- nm
}
X <- if (!stats::is.empty.model(mt)) {
model.matrix(mt, mf, contrasts)
} else {
matrix(, NROW(Y), 0L)
}
weights <- as.vector(model.weights(mf))
if (!is.null(weights) && !is.numeric(weights))
stop("'weights' must be a numeric vector")
if (!is.null(weights) && any(weights < 0))
stop("negative weights not allowed")
if (is.null(weights)) {
weights <- rep.int(1, nrow(mf))
}
offset <- as.vector(model.offset(mf))
if (!is.null(offset)) {
if (length(offset) != NROW(Y))
stop(gettextf("number of offsets is %d should equal %d (number of observations)",
length(offset), NROW(Y)), domain = NA)
} else {
offset <- rep(0, nrow(mf))
}
mustart <- rep(mean(newdata[[po.nme]]), nrow(mf))
fit <- stats::glm.fit(X, Y, weights = weights,
family = quasi(link = link, variance = "constant"),
mustart = mustart, offset = offset,
intercept = attr(mt, "intercept") > 0L,
singular.ok = singular.ok,
control = list(...))
fit.method <- "glm.fit"
if(length(time) > 1 | !missing(id)) {
if(!missing(id)) {
thisid <- mf[["(id)"]]
} else {
thisid <- newdata[[po.id]]
}
if(inherits(attr(mf, "na.action"), "omit")) {
thisid <- thisid[-attr(mf, "na.action")]
}
idord <- order(thisid)
newdatasrt <- newdata[idord,]
thisidsrt <- thisid[idord]
fitgee <- geepack::geese.fit(x = X[idord,], y = Y[idord], id = thisidsrt,
offset = offset[idord],
w = weights[idord], waves = NULL,
control = geepack::geese.control(...),
b = fit$coef, alpha = NULL,
gm = NULL, family = stats::gaussian(),
mean.link = link, variance = "gaussian",
corstr = "independence")
fit$coefficients <- fitgee$beta
fit$cluster.id <- thisid
fit$sandcov <- fitgee$vbeta
colnames(fit$sandcov) <- rownames(fit$sandcov) <- names(fit$coefficients)
fit$converged <- !as.logical(fitgee$error)
fit.method <- "geese"
}
if (model) {
fit$model <- mf
}
fit$na.action <- attr(mf, "na.action")
if (x) {
fit$x <- X
}
if (!y) {
fit$y <- NULL
}
fit.lin <- structure(c(fit, list(call = cal, formula = formula, terms = mt,
data = data, offset = offset, control = list(), method = fit.method,
contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt,
mf))),
class = c(fit$class, c("glm", "lm")))
if(attr(mr, "type") %in% c("right", "mright")) {
datamat <- cbind(mr[, "time"],
mr[, "status"] != 0, ## not censored indicator
mr[, "status"] == causen,
mr[, "status"] == 0 )## censored indicator
} else {
datamat <- NULL
}
fit.lin$datamat <- datamat
fit.lin$time <- time
fit.lin$cause <- cause
fit.lin$link <- link
fit.lin$type <- if(survival) "survival" else "cuminc"
fit.lin$ipcw.weights <- ipcw.weights
fit.lin$competing <- length(unique(mr[, "status"])) > 2
fit.lin$rawPO <- POi
class(fit.lin) <- c("pseudoglm", class(fit.lin))
fit.lin
}
#' Generalized linear models for the restricted mean survival
#'
#' Using pseudo observations for the restricted mean, or the restricted mean
#' lifetime lost in the competing risks case, this function then runs a
#' generalized linear model to estimate associations of the restricted
#' mean/lifetime lost up to a particular time (specified by the \code{time}
#' argument) with covariates. The link function can be "identity" for estimating
#' differences in the restricted mean, "log" for estimating ratios, and any of
#' the other link functions supported by \link[stats]{quasi}.
#'
#' @return A pseudoglm object, with its own methods for print, summary, and
#' vcov. It inherits from glm, so predict and other glm methods are supported.
#'
#' @param formula A formula specifying the model. The left hand side must be a
#' \link[survival]{Surv} object specifying a right censored survival or
#' competing risks outcome. The status indicator, normally 0=alive, 1=dead.
#' Other choices are TRUE/FALSE (TRUE = death) or 1/2 (2=death). For competing
#' risks, the event variable will be a factor, whose first level is treated as
#' censoring. The right hand side is the usual linear combination of
#' covariates.
#' @param time Numeric constant specifying the time up to which the restricted
#' mean effect estimates are desired.
#' @param cause Numeric or character constant specifying the cause indicator of
#' interest.
#' @param link Link function for the restricted mean regression model.
#' @param model.censoring Type of model for the censoring distribution. Options
#' are "stratified", which computes the pseudo-observations stratified on a
#' set of categorical covariates, "aareg" for Aalen's additive hazards model,
#' and "coxph" for Cox's proportional hazards model. With those options, we
#' assume that the time to event and event indicator are conditionally
#' independent of the censoring time, and that the censoring model is
#' correctly specified. If "independent", we assume completely independent
#' censoring, i.e., that the time to event and covariates are independent of
#' the censoring time. the censoring time is independent of the covariates in
#' the model. Can also be a custom function, see Details and the
#' "Extending eventglm" vignette.
#' @param formula.censoring A one sided formula (e.g., \code{~ x1 + x2})
#' specifying the model for the censoring distribution. If NULL, uses the same
#' mean model as for the outcome. Missing values in any covariates for the
#' censoring model will cause an error.
#' @param ipcw.method Which method to use for calculation of inverse
#' probability of censoring weighted pseudo observations. "binder" the
#' default, uses the number of observations as the denominator, while the
#' "hajek" method uses the sum of the weights as the denominator.
#' @param data Data frame in which all variables of formula can be interpreted.
#' @param weights an optional vector of 'prior weights' to be used in the
#' fitting process. Should be NULL or a numeric vector.
#' @param subset an optional vector specifying a subset of observations to be
#' used in the fitting process.
#' @param na.action a function which indicates what should happen when the data
#' contain \code{NA}s. The default is set by the \code{na.action} setting of
#' \link[base]{options}, and is \link[stats]{na.fail} if that is unset. The
#' 'factory-fresh' default is \link[stats]{na.omit}. Another possible value is
#' NULL, no action. Value \link[stats]{na.exclude} can be useful.
#' @param offset this can be used to specify an a priori known component to be
#' included in the linear predictor during fitting. This should be NULL or a
#' numeric vector of length equal to the number of cases. One or more
#' \link[stats]{offset} terms can be included in the formula instead or as
#' well, and if more than one is specified their sum is used. See
#' \link[stats]{model.offset}.
#' @param control a list of parameters for controlling the fitting process. This
#' is passed to \link[stats]{glm.control}.
#' @param model a logical value indicating whether model frame should be
#' included as a component of the returned value.
#' @param x logical value indicating whether the model matrix used in the
#' fitting process should be returned as components of the returned value.
#' @param y logical value indicating whether the response vector
#' (pseudo-observations) used in the fitting process should be returned as
#' components of the returned value.
#' @param singular.ok logical; if FALSE a singular fit is an error.
#' @param contrasts an optional list. See the contrasts.arg of
#' \link[stats]{model.matrix.default}.
#' @param id An optional integer vector of subject identifiers, found in data.
#' If this is present, then generalized estimating equations will be used
#' to fit the model. This can be used, for example, if there are multiple
#' observations per individual represented as multiple rows in data.
#' @param ... Other arguments passed to \link[stats]{glm.fit}
#'
#'
#' @export
#'
#' @details The argument "model.censoring" determines how the pseudo observations
#' are calculated. This can be the name of a function or the function itself, which
#' must have arguments "formula", "time", "cause", "data", "type",
#' "formula.censoring", and "ipcw.method". If it is the name of a function, this code
#' will look for a function with the prefix "pseudo_" first, to avoid clashes with
#' related methods such as coxph. The function then must return a vector
#' of pseudo observations, one for each subject in data which are used in subsequent
#' calculations. For examples of the implementation, see the "pseudo-modules.R"
#' file, or the vignette "Extending eventglm".
#'
#' @examples
#' rmeanipcw <- rmeanglm(Surv(etime, event) ~ age + sex,
#' time = 200, cause = "pcm", link = "identity",
#' model.censoring = "independent", data = mgus2)
#' # stratified on only the categorical covariate
#' rmeanipcw2 <- rmeanglm(Surv(etime, event) ~ age + sex,
#' time = 200, cause = "pcm", link = "identity",
#' model.censoring = "stratified",
#' formula.censoring = ~ sex, data = mgus2)
rmeanglm <- function(formula, time, cause = 1, link = "identity",
model.censoring = "independent", formula.censoring = NULL,
ipcw.method = "binder",
data,
weights, subset,
na.action, offset,
control = list(...), model = FALSE,
x = TRUE, y = TRUE, singular.ok = TRUE, contrasts = NULL,
id, ...) {
stopifnot(length(time) == 1)
stopifnot(is.numeric(time))
cal <- match.call()
mr <- model.response(model.frame(update.formula(formula, . ~ 1), data = data))
newdata <- do.call(rbind, lapply(1:length(time), function(i) data))
matcau <- match_cause(mr, cause)
causec <- matcau$causec
causen <- matcau$causen
otype <- "rmean"
pseudo_function <- check_mod_cens(model.censoring)
POi <- pseudo_function(formula = formula, time = time, cause = cause,
data = data, type = otype, formula.censoring = formula.censoring,
ipcw.method = ipcw.method)
if(is.null(attr(POi, "ipcw.weights"))) {
ipcw.weights <- rep(1, length(POi))
} else {
ipcw.weights <- attr(POi, "ipcw.weights")
}
nn <- length(POi)
oldnames <- names(newdata)
newnames <- make.unique(c(oldnames, "pseudo.vals"))
po.nme <- newnames[length(newnames)]
newdata[[po.nme]] <- c(POi)
## get stuff ready for glm.fit
formula2 <- update.formula(formula, as.formula(paste(po.nme, "~ .")))
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset",
"weights", "na.action", "offset", "id"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf[["formula"]] <- formula2
mf[["data"]] <- newdata
mf <- eval(mf, parent.frame())
mt <- attr(mf, "terms")
control <- do.call("glm.control", control)
Y <- model.response(mf, "any")
if (length(dim(Y)) == 1L) {
nm <- rownames(Y)
dim(Y) <- NULL
if (!is.null(nm))
names(Y) <- nm
}
X <- if (!stats::is.empty.model(mt)) {
model.matrix(mt, mf, contrasts)
} else {
matrix(, NROW(Y), 0L)
}
weights <- as.vector(model.weights(mf))
if (!is.null(weights) && !is.numeric(weights))
stop("'weights' must be a numeric vector")
if (!is.null(weights) && any(weights < 0))
stop("negative weights not allowed")
if (is.null(weights)) {
weights <- rep.int(1, nrow(mf))
}
offset <- as.vector(model.offset(mf))
if (!is.null(offset)) {
if (length(offset) != NROW(Y))
stop(gettextf("number of offsets is %d should equal %d (number of observations)",
length(offset), NROW(Y)), domain = NA)
} else {
offset <- rep(0, nrow(mf))
}
mustart <- rep(mean(newdata[[po.nme]]), nrow(mf))
fit.method <- "glm.fit"
fit <- stats::glm.fit(X, Y, weights = weights,
family = quasi(link = link, variance = "constant"),
mustart = mustart, offset = offset,
intercept = attr(mt, "intercept") > 0L, singular.ok = singular.ok,
control = list(...))
if(!missing(id)) {
if(!missing(id)) {
thisid <- mf[["(id)"]]
} else {
thisid <- 1:nrow(newdata)
}
if(inherits(attr(mf, "na.action"), "omit")) {
thisid <- thisid[-attr(mf, "na.action")]
}
idord <- order(thisid)
newdatasrt <- newdata[idord,]
thisidsrt <- thisid[idord]
fitgee <- geepack::geese.fit(x = X[idord,], y = Y[idord], id = thisidsrt,
offset = offset[idord],
w = weights[idord], waves = NULL,
control = geepack::geese.control(...),
b = fit$coef, alpha = NULL,
gm = NULL, family = stats::gaussian(),
mean.link = link, variance = "gaussian",
corstr = "independence")
fit$coefficients <- fitgee$beta
fit$cluster.id <- thisid
fit$sandcov <- fitgee$vbeta
colnames(fit$sandcov) <- rownames(fit$sandcov) <- names(fit$coefficients)
fit$converged <- !as.logical(fitgee$error)
fit.method <- "geese"
}
if (model) {
fit$model <- mf
}
fit$na.action <- attr(mf, "na.action")
if (x) {
fit$x <- X
}
if (!y) {
fit$y <- NULL
}
fit.lin <- structure(c(fit, list(call = cal, formula = formula, terms = mt,
data = data, offset = offset, control = list(...),
method = fit.method,
contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt,
mf))),
class = c(fit$class, c("glm", "lm")))
## update variance estimate
datamat <- cbind(mr[, "time"],
mr[, "status"] != 0, ## not censored indicator
mr[, "status"] == causen,
mr[, "status"] == 0 )## censored indicator
fit.lin$datamat <- datamat
fit.lin$time <- time
fit.lin$cause <- cause
fit.lin$link <- link
fit.lin$type <- "rmean"
fit.lin$ipcw.weights <- ipcw.weights
fit.lin$competing <- length(unique(mr[, "status"])) > 2
fit.lin$rawPO <- POi
class(fit.lin) <- c("pseudoglm", class(fit.lin))
fit.lin
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.