R/Data.R

#' example data with 2000 observations of 2 continuous variables
#' 
#' A simulated data set containing 2 continuous variables. 
#' @name ExampleData
#' @docType data
#' @usage data(ExampleData)
#'
#' @format A list containing the following elements:
#' \describe{
#'   \item{z}{simulated continuous covariates V1 and V2, with a time-independent coefficient \eqn{\beta_1(t)=1}
#'and a time-varying coefficient \eqn{\beta_2(t)=sin(3\pi t/4).}}
#'   \item{event}{simulated failure event response; binary variable with 0 or 1.}
#'   \item{time}{simulated observed event times; continuous variable with non-negative values.}
#' }
"ExampleData"


#' example data with 2000 observations of 2 binary variables
#' 
#' A simulated data set containing 2 binary variables. 
#' @name ExampleDataBinary
#' @docType data
#' @usage data(ExampleDataBinary)
#'
#' @format A list containing the following elements:
#' \describe{
#'   \item{z}{simulated binary covariates V1 and V2, with a time-independent coefficient \eqn{\beta_1(t)=1}
#'and a time-varying coefficient \eqn{\beta_2(t)=exp(-1.5t).}}
#'   \item{event}{simulated failure event response; binary variable with 0 or 1.}
#'   \item{time}{simulated observed event times; continuous variable with non-negative values. }
#' }
"ExampleDataBinary"


#' example data for stratified model illustration
#' 
#' A simulated data set containing 2 binary variables from 10 distinct stratums. 
#' @name StrataExample
#' @docType data
#' @usage data(StrataExample)
#'
#' @format A list containing the following elements:
#' \describe{
#'   \item{z}{simulated binary covariates V1 and V2, with a time-independent coefficient \eqn{\beta_1(t)=1}
#'and a time-varying coefficient \eqn{\beta_2(t)=sin(3\pi t/4).}}
#'   \item{event}{simulated failure event response; binary variable with 0 or 1.}
#'   \item{time}{simulated observed event times; continuous variable with non-negative values. }
#'   \item{strata}{simulated strata variable; patients in different stratums have different baseline hazards.}
#' }
"StrataExample"


#' Study to Understand Prognoses Preferences Outcomes and Risks of Treatment
#' @name support
#' @docType data
#' @usage data(support)
#' 
#' @description The SUPPORT dataset tracks five response variables: hospital
#'   death, severe functional disability, hospital costs, and time until death
#'   and death itself. The patients are followed for up to 5.56 years. See Bhatnagar et al. (2020) for details.
#'
#' @details Some of the original data was missing. Before imputation, there were
#'   a total of 9,104 individuals and 47 variables. Following Bhatnagar et al. (2020), a few variables 
#'   were removed. Three response variables were removed:
#'   hospital charges, patient ratio of costs to charges and patient
#'   micro-costs. Hospital death was also removed as it was directly informative
#'   of the event of interest, namely death. Additionally, functional disability and
#'   income were removed as they are ordinal covariates. Finally, 8
#'   covariates were removed related to the results of previous findings: SUPPORT
#'   day 3 physiology score (\code{sps}), APACHE III day 3 physiology score
#'   (\code{aps}), SUPPORT model 2-month survival estimate, SUPPORT model
#'   6-month survival estimate, Physician's 2-month survival estimate for pt.,
#'   Physician's 6-month survival estimate for pt., Patient had Do Not
#'   Resuscitate (DNR) order, and Day of DNR order (<0 if before study). Of
#'   these, \code{sps} and \code{aps} were added on after imputation, as they
#'   were missing only 1 observation. First the imputation is done manually using the normal
#'   values for physiological measures recommended by Knaus et al. (1995). Next,
#'   a single dataset was imputed using \pkg{mice} with default settings. After
#'   imputation, the covariate for surrogate activities of daily
#'   living was not imputed. This is due to collinearity between the other two
#'   covariates for activities of daily living. Therefore, surrogate activities
#'   of daily living were removed. See details in the R package (casebase) by Bhatnagar et al. (2020).
#'
#' @format A data frame with 9,104 observations and 34 variables after imputation
#'   and the removal of response variables like hospital charges, patient ratio
#'   of costs to charges and micro-costs following Bhatnagar et al. (2020). 
#'   Ordinal variables, namely functional disability and income, were also removed. 
#'   Finally, Surrogate activities of daily living were removed due to sparsity. 
#'   There were 6 other model scores in the data-set and they were removed; only aps and sps were kept.
#'   \describe{ 
#'   \item{age}{ stores a double representing age. } 
#'   \item{death}{
#'   death at any time up to NDI (National Death Index) date: 12/31/1994. } 
#'   \item{sex}{ 0=female, 1=male. } 
#'   \item{slos}{ days from study entry to discharge. } 
#'   \item{d.time}{ days of
#'   follow-up. } 
#'   \item{dzgroup}{ each level of dzgroup: ARF/MOSF w/Sepsis,
#'   COPD, CHF, Cirrhosis, Coma, Colon Cancer, Lung Cancer, MOSF with
#'   malignancy. } 
#'   \item{dzclass}{ ARF/MOSF, COPD/CHF/Cirrhosis, Coma and cancer disease classes. } 
#'   \item{num.co}{ the number of comorbidities. }
#'   \item{edu}{ years of education of patients. } 
#'   \item{scoma}{ the SUPPORT coma score based on Glasgow D3. } 
#'   \item{avtisst}{ average TISS, days 3-25. }
#'   \item{race}{ indicates race: White, Black, Asian, Hispanic or other. }
#'   \item{hday}{ day in Hospital at Study Admit.} 
#'   \item{diabetes}{diabetes (Com27-28, Dx 73).} 
#'   \item{dementia}{dementia (Comorbidity 6).} 
#'   \item{ca}{cancer state. } 
#'   \item{meanbp}{ mean arterial blood pressure day 3. } 
#'   \item{wblc}{ white blood cell count on day 3. } 
#'   \item{hrt}{ heart rate day 3. }
#'   \item{resp}{ respiration rate day 3. } 
#'   \item{temp}{ temperature, in Celsius, on day 3. } 
#'   \item{pafi}{ PaO2/(0.01*FiO2) day 3. } 
#'   \item{alb}{serum albumin day 3. } 
#'   \item{bili}{ bilirubin day 3. } 
#'   \item{crea}{ serum creatinine day 3. } 
#'   \item{sod}{ serum sodium day 3. } 
#'   \item{ph}{ serum pH (in arteries) day 3. } 
#'   \item{glucose}{ serum glucose day 3. } 
#'   \item{bun}{ bun day 3. } 
#'   \item{urine}{ urine output day 3. } 
#'   \item{adlp}{ adl patient day 3. }  
#'   \item{adlsc}{ imputed adl calibrated to surrogate, if a surrogate was used for a follow up.} 
#'   \item{sps}{SUPPORT physiology score.}
#'   \item{aps}{apache III physiology score.} }
#'   
#' @source Available at the following website:
#'   \url{https://biostat.app.vumc.org/wiki/Main/SupportDesc}.
#' 
#' @references 
#' 
#' Bhatnagar, S., Turgeon, M., Islam, J., Hanley, J. A., and Saarela, O. (2020) casebase: Fitting Flexible Smooth-in-Time
#' Hazards and Risk Functions via Logistic and Multinomial Regression. 
#' \emph{R package version 0.9.0},
#' <https://CRAN.R-project.org/package=casebase>.
#' 
#' Knaus, W. A., Harrell, F. E., Lynn, J., Goldman, L., Phillips, R. S., Connors, A. F., et al. (1995) 
#' The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. 
#' \emph{Annals of Internal Medicine}, \strong{122(3)}: 191-203.
#' \cr
#' 
#' 
#' @examples
#' data(support)
#' support <- support[support$ca %in% c("metastatic"),]
#' time <- support$d.time
#' death <- support$death
#' diabetes <-  model.matrix(~factor(support$diabetes))[,-1]
#' #sex: female as the reference group
#' sex <- model.matrix(~support$sex)[,-1]
#' #age: continuous variable
#' age <-support$age
#' age[support$age<=50] <- "<50"
#' age[support$age>50 & support$age<=60] <- "50-59"
#' age[support$age>60 & support$age<70] <- "60-69"
#' age[support$age>=70] <- "70+"
#' age <- factor(age, levels = c("60-69", "<50", "50-59", "70+"))
#' z_age <- model.matrix(~age)[,-1]
#' z <- data.frame(z_age, sex, diabetes)
#' colnames(z) <- c("age_50", "age_50_59", "age_70", "diabetes", "male")
#' data <- data.frame(time, death, z)
#' fit.coxtv <- coxtv(event = death, z = z, time = time)
"support"

Try the surtvep package in your browser

Any scripts or data that you put into this service are public.

surtvep documentation built on Oct. 17, 2023, 5:07 p.m.