R/data.R

#' Data on the men in the European Randomized Study of Prostate Cancer Screening
#'
#' @description This data set lists the individual observations for 159,893 men in the core age
#'   group between the ages of 55 and 69 years at entry.
#'
#' @format A data frame with 159,893 observations on the following 3 variables: \describe{
#'   \item{ScrArm}{Whether in Screening Arm (1) or non-Screening arm (0) [numeric]}
#'   \item{Follow.Up.Time}{The time, measured in years from randomization, at which follow-up was
#'   terminated} \item{DeadOfPrCa}{Whether follow-up was terminated by Death from Prostate Cancer
#'   (1) or by death from other causes, or administratively (0)} }
#'
#' @details The men were recruited from seven European countries (centres). Each centre began
#'   recruitment at a different time, ranging from 1991 to 1998. The last entry was in December
#'   2003. The uniform censoring date was December 31, 2006. The randomization ratio was 1:1 in six
#'   of the seven centres. In the seventh, Finland, the size of the screening group was fixed at
#'   32,000 subjects. Because the whole birth cohort underwent randomization, this led to a ratio,
#'   for the screening group to the control group, of approximately 1 to 1.5, and to the
#'   non-screening arm being larger than the screening arm.
#'
#' @source The individual censored values were recovered by James Hanley from the Postcript code
#'   that the NEJM article (Schroder et al.) used to render Figure 2 (see Liu et al. for details).
#'   The uncensored values were more difficult to recover exactly, as the 'jumps' in the
#'   Nelson-Aalen plot are not as monotonic as first principles would imply. Thus, for each arm, the
#'   numbers of deaths in each 1-year time-bin were estimated from the differences in the cumulative
#'   incidence curves at years 1, 2, .. , applied to the numbers at risk within the time-interval.
#'   The death times were then distributed at random within each bin.
#'
#'   The interested reader can 'see' the large numbers of individual censored values by zooming in
#'   on the original pdf Figure, and watching the Figure being re-rendered, or by printing the graph
#'   and watching the printer 'pause' while it superimposes several thousand dots (censored values)
#'   onto the curve. Watching these is what prompted JH to look at what lay 'behind' the curve. The
#'   curve itself can be drawn using fewer than 1000 line segments, and unless on peers into the
#'   PostScript) the almost 160,000 dots generated by Stata are invisible.
#' @references Liu Z, Rich B, Hanley JA. Recovering the raw data behind a non-parametric survival
#'   curve. Systematic Reviews 2014 Dec 30;3:151. doi: 10.1186/2046-4053-3-151.
#' @references Schroder FH, et al., for the ERSPC Investigators.Screening and Prostate-Cancer
#'   Mortality in a Randomized European Study. N Engl J Med 2009;360:1320-8.
#' @examples
#' \dontrun{
#'
#' ###  cumulative incidence plots
#' library(survival)
#' library(casebase)
#' data("ERSPC")
#' KM = survfit(Surv(Follow.Up.Time,DeadOfPrCa) ~ ScrArm, data = ERSPC)
#' str(KM)
#' par(mfrow=c(1,1),mar = c(5,5,0.1,0.1))
#' plot(KM$time[    1: 1501], 1-KM$surv[   1:1501], type="s", col="red" ,
#'      ylab = "Risk", xlab="Years since Randomization")
#' lines(KM$time[1502: 2923], 1-KM$surv[1502: 2923], type="s", col="green" )
#'
#' ###  PopulationTime plots
#' ds <- ERSPC
#' par(mfrow=c(1,1),mar = c(0.01,0.01,0.1,0.1))
#'
#' plot(c(-0.5,15.75),c(-93000,80000), col="white" )
#' set.seed(7654321)
#'
#' OFF = 2000
#'
#'
#' for(i in 0:1) {
#'     t=seq(0.01,14.9,0.01)
#'     S = function(x) sum(ds$Follow.Up.Time[ds$ScrArm==i] >= x)
#'     n = unlist(lapply(t,"S"))
#'     if(i==1) yy =  c(0,n,0) + OFF
#'     if(i==0) yy =  c(0,-n,0) - OFF
#'     polygon(c(0,t,14.9),yy,col="grey80",border=NA)
#'
#'     t.d = ds$Follow.Up.Time[ds$ScrArm==i & ds$DeadOfPrCa==1]
#'
#'     for( j in 1:length(t.d) ) {
#'         time.index =  ceiling(t.d[j]/0.01)
#'         nn   = n[ time.index ]
#'         if(i==1) h = runif(1,0.01*nn,0.99*nn)  + OFF
#'         if(i==0) h = runif(1,-0.99*nn,-0.01*nn) - OFF
#'         points(t.d[j],h, pch=19,cex=0.25,col="red")
#'     }
#' }
#'
#' for (t in 1:15) text(t,0,toString(t), cex=0.75)
#' text(15.25,0,"Year", cex=0.75,adj=c(0,0.5))
#'
#' for (n in seq(0,90000,10000)) {
#'     if(n> 0 & n < 80000) text(-0.1,n+OFF,format(n,big.mark=","), cex=0.75,adj=c(1,0.5))
#'     if(n> 0) text(-0.1,-n-OFF,format(n,big.mark=","), cex=0.75,adj=c(1,0.5))
#'     segments(-0.05,  n+OFF, 0, n+OFF , lwd=0.5)
#'     segments(-0.05, -n-OFF, 0, -n-OFF, lwd=0.5 )
#'
#' }
#' text(4, 70000+OFF,"Screening Arm of ERSPC", cex=1,adj=c(0,0.5))
#' text(4,-85000-OFF,"No-Screening Arm", cex=1,adj=c(0,0.5))
#'
#' text(-0.75,78000+OFF,"Number of
#' Men being Followed", cex=1,adj=c(0,0.5))
#' h = 50000+OFF
#' points(9.5,h, pch=19,cex=0.25,col="red")
#' text(9.6,h,"Death from Prostate Cancer", adj=c(0,0.5))
#'
#' #The randomization of the Finnish cohorts were carried out on January 1 of
#' #each of the 4 years 1996 to 1999. This, coupled with the uniform December 31
#' #2006 censoring date, lead to large numbers of men with exactly 11, 10, 9 or 8
#' #years of follow-up.
#'
#' #Tracked backwards in time (i.e. from right to left) , the PopulationTime
#' #plot shows the recruitment pattern from its beginning in 1991, and in
#' #particular the Jan 1 entries in successive years.
#'
#' #Tracked forwards in time (i.e. from left to right), the plot for the first
#' #three years shows attrition due entirely to death (mainly from other causes).
#' #Since the Swedish and Belgian centres were the last to close their
#' #recruitment - in December 2003 - the minimum potential follow-up is three
#' #years. Tracked further forwards in time (i.e. after year 3) the attrition is
#' #a combination of deaths and staggered entries.
#' }
#'
#'
"ERSPC"

#' Data on transplant patients
#'
#' Data on patients who underwent haematopoietic stem cell transplantation for acute leukaemia.
#'
#' @format A dataframe with 177 observations and 7 variables: \describe{ \item{Sex}{Gender of the
#'   individual} \item{D}{Disease: lymphoblastic or myeloblastic leukemia, abbreviated as ALL and
#'   AML, respectively} \item{Phase}{Phase at transplant (Relapse, CR1, CR2, CR3)} \item{Age}{Age at
#'   the beginning of follow-up} \item{Status}{Status indicator: 0=censored, 1=relapse, 2=competing
#'   event} \item{Source}{Source of stem cells: bone marrow and peripheral blood, coded as BM+PB, or
#'   peripheral blood only, coded as PB} \item{ftime}{Failure time in months} }
#' @source Available at the following website: \url{http://www.stat.unipg.it/luca/R/}
#' @references Scrucca L, Santucci A, Aversa F. Competing risk analysis using R: an easy guide for
#'   clinicians. Bone Marrow Transplant. 2007 Aug;40(4):381-7. doi: 10.1038/sj.bmt.1705727.
"bmtcrr"

Try the casebase package in your browser

Any scripts or data that you put into this service are public.

casebase documentation built on May 29, 2017, 1:40 p.m.