In vjcitn/Repro2017: Illustrating some reproducible research disciplines

suppressPackageStartupMessages({
library(png)
library(grid)
})

Resource

im = readPNG("images/workshopFace.png")
grid.raster(im)

Task

im = readPNG("images/NAStask.png")
grid.raster(im)

The three themes underlying reproducibility research

1) providing code and data and environment to independent parties to diminish risk of analyses that are not reproducible

2) fortifying criteria of statistical soundness of analyses (study interpretations) to control risk of non-replicability of studies

3) doing 1) and 2) in ways that are cost-effective

Y. Benjamini, NAS workshop p. 47

[R]eproducibility is a property of a study, and replicability is a property of a result that can be proved only by inspecting other results of similar experiments. Therefore, the reproducibility of a result from a single study can be assured, and improving the statistical analysis can enhance its replicability.

My reading:

assuring reproducibility requires technique by the investigator
enhancing replicability requires new approaches to statistical measurement of evidence

Road map of talk

Basic terminology
Reflections on GWAS replication and genetic medicine
Replication concepts in the NAS workshop
Reproducible research recommendations of ASA
Some case studies
Some github exercises

A personal view

Extensibility and transportability should not be divorced from reproducibility
- Reproducer/reader should be able to assess effects of modifications to queries and inferences
Scalability cannot be divorced from reproducbility
- Can I check a computation that took the author days to create?
- Does the work support a stepwise approach to verification?
  - Results are reproducible in detail for a meaningful subproblem
  - Results are reproducible in detail for a sequence of meaningful subproblems of increasing difficulty
Computable documents (rmarkdown, jupyter, ...) are essential elements of cost-effective reproducible research

Basic definitions

1) Analysis = software + data + environment + invocations

2) Reproducible analysis = analysis that can be carried out by independent parties

3) Extensible analysis = analysis supporting independent variations

4) Study = Design + implementation + analysis

5) Replicable study = A study that, when executed by independent parties, produces statistically compatible interpretations

Open questions

1) What is an environment for an analysis?

2) What are independent parties? (See Kahneman's concept of "replication etiquette")

3) What are independent variations on an analysis and why are they important?

4) When are two interpretations of related or identical studies statistically compatible? Reformulate in terms of the standards of replicability for a given field. Example: GWAS catalog concept of replicated finding (see, e.g., P. Kraft et al., Stat. Sci. 2009).

GWAS: p-values are not created equal

im = readPNG("images/KraftGWASreplication.png")
grid.raster(im,height=1.2)

GWAS replication recommendations in Kraft et al. 2009

Test the same marker
Use the same analytic methods; report the final model in as much detail as possible so that other investigators can judge the fit of that model in other datasets
Try to use the same phenotype
Issues:
- measuring heterogeneity of effects
- choosing models and studies for metaanalysis
- understanding failure to replicate

STREGA: genetic association reporting guidelines

im = readPNG("images/StregaTitle.png")
grid.raster(im)

Guideline 12

im = readPNG("images/strega12.png")
grid.raster(im)

Guidelines 13-16

im = readPNG("images/strega13_16.png")
grid.raster(im)

Upshots

Replicability is a live concept for population genetics research
Contributions to literature are managed through voluntary guidelines
The EBI/NHGRI GWAS catalog includes a formal replication concept for every reported SNP-phenotype association

GWAS catalog eligibility

im = readPNG("images/gwascat.png")
grid.raster(im)

Genetic medical counseling errors

im = readPNG("images/exacNEJMtitle.png")
grid.raster(im)

Headline

im = readPNG("images/exacNYTpic.png")
grid.raster(im)

Consequences of error

im = readPNG("images/exacNYTtext.png")
grid.raster(im)

Moving forward

im = readPNG("images/exacUseRecs.png")
grid.raster(im)

A view of Exome Aggregation Consortium

im = readPNG("images/exacHistosPCA.png")
grid.raster(im)

Upshots

Genetic epidemiology's use of GWAS includes explicit requirement for replication
Implementation of the requirement uses the concepts
- combined p-value for base and replication studies
- in absence of combined p-value, separate thresholds of $\log_{10} p < -5$ in two studies
More care is needed in assessing evidence of genetic risk in medical applications
Accurate interpretation of large-scale aggregation/stratification of genetic findings must become routine

Some material from the NAS workshop: Benjamini

Poor replicability of animal phenotyping studies is due to inaccurate reckoning of variability: genotype $\times$ laboratory interactions are present but contribution to variance is not known
Quantitative replicability is present when two or more studies agree on a given finding
- For two studies with null hypotheses $H_{01}$, $H_{02}$, replicabilityrequires that the union null $H_{01} \cup H_{02}$ is rejected in favor of the conjunction alternative
- $r$-value = $\max(P_1, P_2)$
- generalizations to larger syntheses change the character of interpretation: not just summarizing evidence on existence of an effect, but measuring evidence of a replicable effect

Some material from the NAS workshop: Boos

Replication condition for a scalar summary index $T$ computed in two experiments: $(T_1, T_2) =_d (T_2, T_1)$
$P(T_2 > T_1) = 1/2$, so if $T_1$ is "borderline significant" there is substantial probability that a genuinely replicating experiment will not be "significant"
P-values possess statistical variability, yet are seldom reported with intervals or standard errors
Replication probability for one-sample normal mean with $n$ independent samples $$ RP = 1 - F_{t,n-1,ncp}(F^{-1}_{t,n-1}(1-\alpha)) $$

Boos: Replication probabilities are low for conventional thresholds

im = readPNG("images/boosStefRP.png")
grid.raster(im,width=.95)

Some material from the NAS workshop: Valen Johnson (PNAS paper)

im = readPNG("images/johnson1.png")
grid.raster(im,width=.99)

Costs of more stringent thresholds: $N_{strong} > 1.7 \times N_{conventional}$

im = readPNG("images/johnson2.png")
grid.raster(im,width=.99)

Upshots

Statistical theory for enhancing replicability of analyses of independent experiments is substantial
Costs of increasing replicability are non-trivial
Preserving the reputation of advanced science may well justify the expenditure
Cost-efficient experimental designs are not much discussed in NAS workshop

Transition: Some case studies in reproducibility

What I have done or promoted to simplify reproducible and extensible analyses in various domains
- Cancer genomics: entering the Duke ovarian cancer fray
- Bioconductor: continuous integration
- Immune Tolerance Network "trialshare": bridging publication to data and interactive analysis
- barca: excavating old code on request, to github and travis CI

dressCheck: a package that backs up a book chapter

im = readPNG("images/ochsCover.png")
grid.raster(im,width=.99)

Discuss: Rebutting a rebuttal

(before)

im = readPNG("images/dressmanAttack56.png")
grid.raster(im,width=.99)

(after)

im = readPNG("images/dressmanRetraction.png")
grid.raster(im,width=.99)

Starting out

im = readPNG("images/careyStoddenCaseStuds.png")
grid.raster(im,width=.99)

Basic questions

What are the numerical data on which the arguments are based?
What analysis of this data is in question?
How can we offer and respond to criticism, allowing for variations on approach, to help clarify the situation?

Answer: A chapter supported by a Bioconductor package

Code will be portable
If it goes stale, we are warned and can provide fixes

Build reports for ExperimentData packages

im = readPNG("images/buildHeader.png")
grid.raster(im,width=.99)

checking dressCheck

im = readPNG("images/dressCheckCI.png")
grid.raster(im,width=.99)

How to proceed

install the package
build the vignette
compare with chapter contents
raise an issue if you disagree

Beyond selected static graphics

psurv = function (x, digits = max(options()$digits - 4, 3), ...) 
{
#
# obtain p-value for log-rank test
#
    saveopt <- options(digits = digits)
    on.exit(options(saveopt))
    if (!inherits(x, "survdiff")) 
        stop("Object is not the result of survdiff")
    if (!is.null(cl <- x$call)) {
    }
    omit <- x$na.action
    if (length(omit)) {
    }
    if (length(x$n) == 1) {
        z <- sign(x$exp - x$obs) * sqrt(x$chisq)
        temp <- c(x$obs, x$exp, z, signif(1 - pchisq(x$chisq, 
            1), digits))
        names(temp) <- c("Observed", "Expected", "Z", "p")
        print(temp)
    }
    else {
        if (is.matrix(x$obs)) {
            otmp <- apply(x$obs, 1, sum)
            etmp <- apply(x$exp, 1, sum)
        }
        else {
            otmp <- x$obs
            etmp <- x$exp
        }
        df <- (sum(1 * (etmp > 0))) - 1
        temp <- cbind(x$n, otmp, etmp, ((otmp - etmp)^2)/etmp, 
            ((otmp - etmp)^2)/diag(x$var))
        dimnames(temp) <- list(names(x$n), c("N", "Observed", 
            "Expected", "(O-E)^2/E", "(O-E)^2/V"))
        uu <- 1 - pchisq(x$chisq, df)
    }
    uu
}

extendDressCheck = function() {
suppressPackageStartupMessages({
  library(survival)
  library(shiny)
  library(dressCheck)
  library(chron)
})
  data(E2F3sig.probes)
  data(Src.probes)
  tokeep = unique(c(E2F3sig.probes, Src.probes))
  if (!exists("corrp116")) data(corrp116)
  tokeep = intersect(tokeep, rownames(exprs(corrp116)))
  library(hgu133plus2.db)
suppressMessages({
  s133 = select(hgu133plus2.db, 
     keys=tokeep, keytype="PROBEID", columns=c("SYMBOL")) 
})
  pid = s133[,1]
  names(pid) = s133[,2]
  drop = which(is.na(names(pid)))
  pid = pid[-drop]
  pid = pid[order(names(pid))]
  npid = names(pid)

fixup = function(vec) {
  rgt = c(vec, vec[length(vec)])
  lef = c("<", paste(vec, "-", sep=""))
  lef = sub(paste(vec[length(vec)],"-",sep=""), ">", lef)
  paste(lef,rgt, sep="")
}

dateStrat = function(chr, nstrat=3) {
 ps = (1:nstrat)/nstrat
 cuts = c(ps[-length(ps)])
 qs = quantile(chr, cuts)
 intervals = fixup(as.character(qs))
 tags = rep(" ", length(chr))
 for (i in 1:length(qs)) 
  tags[ which(chr < qs[i] & tags == " ") ] = intervals[i]
 tags[ which(chr >= qs[i]) ] = intervals[length(qs)+1]
 ordered(tags, levels=intervals)
}

  ui = fluidPage(
        sidebarPanel(
         helpText("Interactive analysis of Dressman et al. JCO 2007 data as discussed in Carey and Stodden 2010"),
         helpText(" "),
         helpText("Select a gene for plotting expression against sample date"),
         selectizeInput("insym", "input", choices=npid, selected="RPS11"),
         helpText("Pick a number of date strata for logrank/K-M plots (for platinum non-responders)"),
         radioButtons("numstrat", "# date strata", choices=2:5, selected=2)
        ),
        mainPanel(
         tabsetPanel(
          tabPanel("Expr Confounding", plotOutput("confplot")),
          tabPanel("Surv Confounding", plotOutput("survplot"))
         )
        )
       )
  server = function(input, output, session) {
    output$chk = renderText({input$insym})
    output$confplot = renderPlot( {
               an = as.numeric
               probe = pid[input$insym]
               plot(an(exprs(corrp116[probe,]))~chron(corrp116$rundate), main="(a)",
        xlab="array run date", ylab=paste("RMA+SFR expression of", input$insym))
               } )
    output$survplot = renderPlot( {
        num = as.numeric(input$numstrat)
        CC = cut(chron(corrp116$rundate), num) #input$numstrat)
        dstra = dateStrat(chron(corrp116$rundate), num)
        with(pData(corrp116), d0 <<- survdiff(Surv(Survival, dead)~dstra,
        subset=CR==0))

        with(pData(corrp116), plot(survfit(Surv(Survival, dead)~dstra,
        subset=CR==0),col=1:num, lwd=3,
        xlab="Months", ylab="survival (%)", main="(b)"))
        text(37,.03, paste("logrank p=", round(psurv(d0),3)))
        legend(70, .8, lty=1, lwd=3, col=1:num, legend=levels(dstra))
        } )
  }
  shinyApp(ui, server)
}
extendDressCheck()

Upshots

Debates about contentious analyses involve software, data, and interpretation
The identity of the data can be hard to pin down, so, when feasible, place it in a versioned package
Include (and explain) code underlying interpretations, and provide durable assurance of runnability through continuous integration
Socialize the data and code through github so that interested parties can raise issues, fork, issue pull requests
- attend to licensing issues
- Bioconductor transitioning to git/github soon!

Bridging from publication to data

Statistical analysis of the RAVE study (NEJM 2013) includes a link to figures and code

Using github and travis-ci

visit and clone github.com/vjcitn/barca with an eye towards contributing unit tests

Conclusions

General agreement that code and data access are fundamental to reliable scientific progress
Accreditation for data contributions recently addressed in NEJM Sounding board
Software usability, reliability and maintenance can be challenging to establish
We must work to build awareness of trustworthy concepts of statistical significance in the broad scientific community

vjcitn/Repro2017 documentation built on May 3, 2019, 6:13 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vjcitn/Repro2017 Illustrating some reproducible research disciplines

In vjcitn/Repro2017: Illustrating some reproducible research disciplines

Resource

Task

The three themes underlying reproducibility research

Y. Benjamini, NAS workshop p. 47

Road map of talk

A personal view

Basic definitions

Open questions

GWAS: p-values are not created equal

GWAS replication recommendations in Kraft et al. 2009

STREGA: genetic association reporting guidelines

Guideline 12

Guidelines 13-16

Upshots

GWAS catalog eligibility

Genetic medical counseling errors

Headline

Consequences of error

Moving forward

A view of Exome Aggregation Consortium

Upshots

Some material from the NAS workshop: Benjamini

Some material from the NAS workshop: Boos

Boos: Replication probabilities are low for conventional thresholds

Some material from the NAS workshop: Valen Johnson (PNAS paper)

Costs of more stringent thresholds: $N_{strong} > 1.7 \times N_{conventional}$

Upshots

Transition: Some case studies in reproducibility

dressCheck: a package that backs up a book chapter

Discuss: Rebutting a rebuttal

(after)

Starting out

Basic questions

Build reports for ExperimentData packages

checking dressCheck

How to proceed

Beyond selected static graphics

Upshots

Bridging from publication to data

Using github and travis-ci

Conclusions

R Package Documentation

Browse R Packages

We want your feedback!

vjcitn/Repro2017
Illustrating some reproducible research disciplines