get_LRboot: Summary likelihood ratio tests

View source: R/bootLRT.R

get_LRbootR Documentation

Summary likelihood ratio tests

Description

get_LRboot provides a fast approximation to bootstrap distribution of likelihood ratio statistic. The bootstrap distribution of the likelihood ratio (LR) statistic may be used to correct the tests based on its asymptotic chi-square distribution. However, the standard bootstrap involves resimulating the data-generating process, given the ML estimates on the original data. This function implements a fast approximation avoiding such simulation, instead drawing from the inferred distribution of (projected, if relevant) summary statistics, again given the maximum (summary-)likelihood estimates.

SLRT computes likelihood ratio tests based on the summary-likelihood surface and optionally on get_LRboot results. Several correction of the basic likelihood ratio test may be reported, some more speculative than others.

Usage

SLRT(object, h0, nsim=0L, BGP=NULL, ...)
get_LRboot(object, h0_pars = NULL, nsim = 100L, reset = TRUE, BGP=object$MSL$MSLE, ...)

Arguments

object

an SLik_j object.

h0

Numeric named vector of tested parameter values.

h0_pars

either NULL (the default), to approximate the distribution of the LR statistic for the full vector of estimated parameters; or a vector of names of a subset of this vector, to approximate the distribution of the profile LR statistic for this subset.

nsim

Integer: number of bootstrap replicates. Values lower than the default are not recommended. Note that this will be ignored if the distribution has previously been simulated and reset=FALSE.

reset

Boolean: Whether to use any previously computed distribution (see Details) or not.

BGP

Named numeric vector of “Bootstrap-Generating Parameters”. Ideally the distribution of the LR test statistic would be pivotal and thus the parameter values under which this distribution is simulated would not matter. In practice, simulating by default its distribution under the “best” available information (the MSLE for get_LRboot, or the specifically tested hypothesis defined by the h0 argument of SLRT) may be more accurate than under alternative parametric values. For h0 being an incomplete parameter vector and BGP is NULL (the default), SLRT will simulate under a completed parameter vector using estimates of other parameters maximizing the likelihood profile for h0.

...

For SLRT: further arguments passed to get_LRboot. For get_LRboot: further arguments controlling parallelization, including nb_cores. However, parallelization may be best ignored in most cases (see Details).

Details

The result of calling get_LRboot (either directly or through SLRT) with given h0_pars is stored in the object (until the next refine), and this saved result is returned by a next call to get_LRboot with the same h0_pars if reset=FALSE. The default is however to recompute the distribution (reset=TRUE).

Parallelization is possible but maybe not useful because computations for each bootstrap replicate are fast relative to parallelization overhead. It will be called when the ... arguments include an nb_cores>1. The ... may include further arguments passed to dopar, but among the dopar arguments, iseed will be ignored, and fit_env should not be used.

A raw bootstrap p-value can be computed from the simulated distribution as (1+sum(t >= t0))/(N+1) where t0 is the original likelihood ratio, t the vector of bootstrap replicates and N its length. See Davison & Hinkley (1997, p. 141) for discussion of the adjustments in this formula. However, a sometimes more economical use of the bootstrap is to provide a Bartlett correction for the likelihood ratio test in small samples. According to this correction, the mean value m of the likelihood ratio statistic under the null hypothesis is computed (here estimated by simulation) and the original LR statistic is multiplied by n/m where n is the number of degrees of freedom of the test. Unfortunately, the underlying assumption that the corrected LR statistic follows the chi-square distribution does not always work well.

Value

get_LRboot returns a numeric vector representing the simulated distribution of the LR statistic, i.e. twice the log-likelihood difference, as directly used in pchisq() to get the p-value.

SLRT returns a list with the following element(s), each being a one-row data frame:

basicLRT

A data frame including values of the likelihood ratio chi2 statistic, its degrees of freedom, and the p-value;

and, if a bootstrap was performed:

BartBootLRT

A data frame including values of the Bartlett-corrected likelihood ratio chi2 statistic, its degrees of freedom, and its p-value;

rawBootLRT

A data frame including values of the likelihood ratio chi2 statistic, its degrees of freedom, and the raw bootstrap p-value;

safeBartBootLRT

equal to rawBootLRT if the mean bootstrap value of the LR statistic is lower than the number of degrees of freedom, and to BartBootLRT otherwise.

References

Bartlett, M. S. (1937) Properties of sufficiency and statistical tests. Proceedings of the Royal Society (London) A 160: 268-282.

Davison A.C., Hinkley D.V. (1997) Bootstrap methods and their applications. Cambridge Univ. Press, Cambridge, UK.

Examples

 
## See help("example_reftable") for SLRT() examples;
## continuing from there, after refine() steps for good results:
# set.seed(123);mean(get_LRboot(slik_j, nsim=500, reset=TRUE)) # close to df=2 
# mean(get_LRboot(slik_j, h0_pars = "s2", nsim=500, reset=TRUE)) # close to df=1 

## Not run: 
### Simulation study of performance of the corrected LRTs: 

## Same toy example as in help("example_reftable"):
blurred <- function(mu,s2,sample.size) {
    s <- rnorm(n=sample.size,mean=mu,sd=sqrt(s2))
    s <- exp(s/4)
    return(c(mean=mean(s),var=var(s)))
  }

## First build a largish reference table and projections to be used in all replicates
# Only the 600 first rows will be used as initial reference table for each "data"
#
set.seed(123)
#
parsp_j <- data.frame(mu=runif(6000L,min=2.8,max=5.2),
                      s2=runif(6000L,min=0.4,max=2.4),sample.size=40)
dsimuls <- add_reftable(,Simulate="blurred",par.grid=parsp_j,verbose=FALSE)
#
mufit <- project("mu",stats=c("mean","var"),data=dsimuls,verbose=TRUE)
s2fit <- project("s2",stats=c("mean","var"),data=dsimuls,verbose=TRUE)
dprojectors <- list(MEAN=mufit,VAR=s2fit)
dprojSimuls <- project(dsimuls,projectors=dprojectors,verbose=FALSE)

## Function for single-data analysis:
#
foo <- function(y, refine_maxit=0L, verbose=FALSE) {
  dSobs <- blurred(mu=4,s2=1,sample.size=40) 
  ## ----Inference workflow-----------------------------------------------
  dprojSobs <- project(dSobs,projectors=dprojectors)
  dslik <- infer_SLik_joint(dprojSimuls[1:600,],stat.obs=dprojSobs,verbose=FALSE)
  dslik <- MSL(dslik, verbose=verbose, eval_RMSEs=FALSE)
  if (refine_maxit) dslik <- refine(dslik, maxit=refine_maxit)
  ## ---- LRT-----------------------------------------------
  lrt <- SLRT(dslik, h0=c(s2=1), nsim=200)
  c(basic=lrt$basicLRT$p_value,raw=lrt$rawBootLRT$p_value,
    bart=lrt$BartBootLRT$p_value,safe=lrt$safeBartBootLRT$p_value)
}

## Simulations using convenient parallelization interface:
#
# library(doSNOW) # optional
#
bootreps <- spaMM::dopar(matrix(1,ncol=200,nrow=1),              # 200 replicates of foo()
  fn=foo, fit_env=list(blurred=blurred, dprojectors=dprojectors, dprojSimuls=dprojSimuls), 
  control=list(.errorhandling = "pass", .packages = "Infusion"),
  refine_maxit=5L,
  nb_cores=parallel::detectCores()-1L, iseed=123)
#
plot(ecdf(bootreps["basic",]))
abline(0,1)
plot(ecdf(bootreps["bart",]), add=TRUE, col="blue")
plot(ecdf(bootreps["safe",]), add=TRUE, col="red")
plot(ecdf(bootreps["raw",]), add=TRUE, col="green") 
#
# Note that refine() iterations are important for good performance.
# Without them, even a larger reftable of 60000 lines 
# may exhibit poor results for some of the CI types.

## End(Not run)

Infusion documentation built on May 3, 2023, 5:10 p.m.