echo <- TRUE # include code in report require(hreport) knitrSet(lang='markdown', fig.path='figure/', echo=echo) mu <- markupSpecs$html # in Hmisc - HTML markups frac <- mu$frac mu$styles() # define HTML styles, functions
Load(ssafety) ssafety <- upData(ssafety, rdate=as.Date(rdate), smoking=factor(smoking, 0:1, c('No','Yes')), labels=c(smoking='Smoking', bmi='BMI', pack.yrs='Pack Years', age='Age', height='Height', weight='Weight'), units=c(age='years', height='cm', weight='Kg'), print=FALSE) mtime <- function(f) format(file.info(f)$mtime) datadate <- mtime('ssafety.rda') primarydatadate <- mtime('ssafety.rda') ## List of lab variables that are missing too much to be used omit <- Cs(amylase,aty.lymph,glucose.fasting,neutrophil.bands) ## Make a list that separates variables into major categories vars <- list(baseline=Cs(age, sex, race, height, weight, bmi, smoking, pack.yrs), ae =Cs(headache, ab.pain, nausea, dyspepsia, diarrhea, upper.resp.infect, coad), ekg =setdiff(names(ssafety)[c(49:53,55:56)], 'atrial.rate'), chem=setdiff(names(ssafety)[16:48], c(omit, Cs(lymphocytes.abs, atrial.rate, monocytes.abs, neutrophils.seg, eosinophils.abs, basophils.abs)))) week <- ssafety$week weeks <- sort(unique(week)) base <- subset(ssafety, week==0) denom <- c(c(enrolled=500, randomized=nrow(base)), table(base$trx)) sethreportOption(tx.var='trx', denom=denom) ## Initialize app.tex
The reporting tools used here are based on a number of lessons learned from the intersection of the fields of statistical graphics, graphic design, and cognitive psychology, especially from the work of Bill Cleveland, Ralph McGill, John Tukey, Edward Tufte, and Jacques Bertin.
Needles represent the fraction of observations used in the current analysis. The first needle (red) shows the fraction of enrolled patients used. If randomization was taken into account, a second needle (green) represents the fraction of randomized subjects included in the analysis. When the analyses consider treatment assignment, two more needles may be added to the display, showing, respectively, the fraction of subjects randomized to treatment A used in the analysis and the fraction of subjects on treatment B who were analyzed. The colors of these last two needles are the colors used for the two treatments throughout the report. The following table shows some examples.
# Store using short variable names so Rmarkdown table column # width will not be wider than actually needed d1 <- dNeedle(1) d2 <- dNeedle((3:4)/4) d3 <- dNeedle((1:2)/4) d4 <- dNeedle(c(1,2,3,1)/4)
|Signpost | Interpretation |
|------- | -------------------------------------------------|
| r d1
| All enrolled subjects analyzed, randomization not considered|
| r d2
| Analysis uses r frac(3,4)
of enrolled subjects, and all randomized subjects|
| r d3
| Analysis uses r frac(1,4)
of enrolled subjects, and r frac(1,2)
of randomized subjects|
| r d4
| Same as previous example, and in addition the analysis utilized treatment assignment, analyzing r frac(3,4)
of those randomized to A and r frac(1,4)
of those randomized to B|
Dot charts are used to present stratified proportions. Details, including all numerators and denominators of proportions, can be revealed by hovering the mouse over a point.
Graphs containing pairs of Kaplan-Meier survival curves show a shaded region centered at the midpoint of the two survival estimates and having a height equal to the half-width of the approximate 0.95 pointwise confidence interval for the difference of the two survival probabilities. Time points at which the two survival estimates do not touch the shaded region denote approximately significantly different survival estimates, without any multiplicity correction. Hover the mouse to see numbers of subjects at risk at a specific follow-up time, and more information.
This is a sample of the part of a closed meeting Data Monitoring
Committee report that contains software generated results. Components
related to efficacy, study design, data monitoring
plan,^[Lan-DeMets monitoring bounds can be plotted using the open source R gsDesign
package.] summary of previous closed report, interpretation, protocol changes, screening, eligibility, and waiting time until treatment commencement
are not included in this example^[See Ellenberg, Fleming, and DeMets, Data Monitoring Committees in Clinical Trials (Wiley, 2002), pp. 73-74 for recommended components in open and closed data monitoring committee reports.]. This report used a random sample of safety data from a randomized clinical trial. Randomization date, dropouts, and compliance variables were simulated, the latter two not
being made consistent with the presence or absence of actual data in the random sample. The date and time that the analysis file used here was last updated wasr datadate
. Source analysis files were last updated on primarydatadate
.
accrualReport(randomize(rdate) ~ site(site), data=base, dateRange=c('1990-01-01','1994-12-31'), targetDate='1994-12-31', targetN=300, closeDate=max(base$rdate))
# Simulate regions set.seed(1) base$region <- sample(c('north', 'south'), nrow(base), replace=TRUE) dReport(sex + race + smoking ~ region + trx, groups='trx', data=addMarginal(base, region)) ## Show spike histogram and quantiles for raw data dReport(age + height + weight + bmi + pack.yrs ~ trx, data=base, popts=list(ncols=2))
dReport(headache + ab.pain + nausea + dyspepsia + diarrhea + upper.resp.infect + coad ~ week + trx + id(id), groups='trx', data=ssafety, what='byx', popts=list(ncols=2, height=700, width=1100))
## Reformat to one record per event per subject per time aev <- vars$ae ev <- ssafety[ssafety$week > 0, c(aev, 'trx', 'id', 'week')] ## Reshape to tall and thin format evt <- reshape(ev, direction='long', idvar=c('id', 'week'), varying=aev, v.names='sev', timevar='event', times=aev) ## For each event, id and trx see if event occurred at any week ne <- with(evt, summarize(sev, llist(id, trx, event), function(y) any(y > 0, na.rm=TRUE))) ## Remove non-occurrences of events ne <- subset(ne, sev, select=c(id, trx, event)) ## Replace event names with event labels elab <- sapply(ssafety[aev], label) ne$event <- elab[ne$event] label(ne$trx) <- 'Treatment' eReport(event ~ trx, data=ne)
dReport(axis + corr.qt + pr + qrs + uncorr.qt + hr ~ week + trx + id(id), groups='trx', data=ssafety, what='byx', popts=list(ncols=2, height=1300, width=1100))
## Plot 6 variables per figure cvar <- split(vars$chem, rep(letters[1:4], each=6)) form <- list() for(sub in names(cvar)) { f <- paste(cvar[[sub]], collapse=' + ') form[[sub]] <- as.formula(paste(f, 'week + trx + id(id)', sep=' ~ ')) } do <- function(form) dReport(form, groups='trx', data=ssafety, what='byx', popts=list(ncols=2, height=1300, width=1100, dhistboxp.opts=list(nmin=10, ff1=1.35))) # Minimum of 10 observatins per x per group for histogram and quantiles # to be drawn (default is nmin=5) do(form$a) do(form$b) do(form$c) do(form$d) # dReport(wbc ~ week + trx + id(id), groups='trx', data=ssafety, # what='byx', popts=list(dhistboxp.opts=list(ff1=1.2))) ## Repeat last figure using quantile intervals instead of spike histograms dReport(form$d, groups='trx', data=ssafety, what='byx', byx.type='quantiles', popts=list(ncols=2, height=1300, width=1100))
set.seed(1) n <- 400 dat <- data.frame(t1=runif(n, 2, 5), t2=runif(n, 2, 5), e1=rbinom(n, 1, .5), e2=rbinom(n, 1, .5), cr1=factor(sample(c('cancer','heart','censor'), n, TRUE), c('censor', 'cancer', 'heart')), cr2=factor(sample(c('gastric','diabetic','trauma', 'censor'), n, TRUE), c('censor', 'diabetic', 'gastric', 'trauma')), treat=sample(c('a','b'), n, TRUE)) dat <- upData(dat, labels=c(t1='Time to operation', t2='Time to rehospitalization', e1='Operation', e2='Hospitalization', treat='Treatment'), units=c(t1='Year', t2='Year'), print=FALSE) denom <- c(enrolled=n + 40, randomized=400, a=sum(dat$treat=='a'), b=sum(dat$treat=='b')) if(FALSE) { sethreportOption(denom=denom, tx.var='treat') survReport(Surv(t1, e1) + Surv(t2, e2) ~ treat, data=dat, what='S') # Show estimates combining treatments survReport(Surv(t1, e1) + Surv(t2, e2) ~ 1, data=dat, what='S', times=3, ylim=c(.1, 1)) # Same but use multiple figures and use 1 - S(t) scale survReport(Surv(t1, e1) + Surv(t2, e2) ~ treat, data=dat, multi=TRUE, what='1-S', times=3:4, aehaz=FALSE) survReport(Surv(t1, e1) + Surv(t2, e2) ~ 1, data=dat, multi=TRUE, what='1-S', y.n.risk=-.02) }
These analyses were done using the following versions of R[@Rsystem], the
operating system, and add-on packages hreport
,
Hmisc
[@Hmisc], rms
[@rrms], and others:
print(sessionInfo(), locale=FALSE)
The reproducible research framework knitr
[@knitrbook] was used.
This report was produced using high-quality open source, freely
available R packages. High-level R graphics and html making functions in FE Harrell's Hmisc
package were used in the context of the R knitr
package and RStudio
with Rmarkdown
. A new R package hreport
contains functions accrualReport
, dReport
, exReport
, eReport
, and survReport
using the philosophy of program-controlled generation of html and markdown text, figures, and tables. When figures were plotted in R, figure legends were automatically generated.
The entire process is best managed by creating a single .Rmd
file that is executed using the knitr
package in R.
Variable labels are used in much of the graphical and tabular output,
so it is advisable to attach label
attributes to almost all
variables. Variable names are used when label
s are not
defined. Units of measurement also appear in the output, so most
continuous variables should have a units
attribute. The
units
may contain mathematical expressions such as cm^2
which will be properly typeset in tables and plots, using
superscripts, subscripts, etc. Variables that are not binary (0/1,
Y/N
, etc.) but are categorical should have levels
(value
labels) defined (e.g., using the factor
function) that will be
attractive in the report. The Hmisc library upData
function is
useful for annotating variables with labels, units of measurement, and
value labels. See
Alzola and Harrell, 2006, this, and
this for details
about setting up analysis files.
R code that created the analysis file for this report is in the inst/tests
directory of the hreport
package source. For this particular application, units
and some of the labels
were actually obtained from separate data tables
as shown in the code.
tx.var
on sethreportOption
.hreport
functions and subject have repeated measurements ($>1$ record), an id
variable must be given.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.