Infer log Likelihoods using simulated distributions of summary statistics

Description

For each simulated distribution of summary statistics, infer_logLs infers a probability density function, and the density of the observed values of the summary statistics is deduced. By default, inference of each density is performed by infer_logL_by_Rmixmod, which fits a distribution of summary statistics using procedures from the Rmixmod package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
infer_logLs(object, stat.obs, 
            logLname = Infusion.getOption("logLname"), 
            verbose = list(most=interactive(), 
                           final=FALSE), 
            method="infer_logL_by_Rmixmod",
            ...)
infer_tailp(object, refDensity, stat.obs,
                tailNames=Infusion.getOption("tailNames"),
                verbose=interactive(), method=NULL,...)
infer_logL_by_GLMM(EDF,stat.obs,logLname,verbose)
infer_logL_by_Rmixmod(EDF,stat.obs,logLname,verbose)
infer_logL_by_mclust(EDF,stat.obs,logLname,verbose)
infer_logL_by_Hlscv.diag(EDF,stat.obs,logLname,verbose)

Arguments

object

A list of simulated distributions (the return object of add_simulation)

EDF

An empirical distribution, with a required par attribute (an element of the object list).

stat.obs

Named numeric vector of observed values of summary statistics.

logLname

The name to be given to the log Likelihood in the return object, or the root of the latter name in case of conflict with other names in this object.

tailNames

Names of “positives” and “negatives” in the binomial response for the inference of tail probabilities.

refDensity

An object representing a reference density (such as an HLfit fit object or other objects with a similar predict method) which, together with the density inferred from each empirical density, defines a likelihood ratio used to define a rejection region.

verbose

A list as shown by the default, or simply a vector of booleans, indicating respectively whether to display (1) some information about progress; (2) a final summary of the results after all elements of simuls have been processed. If a count of 'outlier'(s) is reported, this typically means that stat.obs is not within the envelope of a simulated distribution (or whatever other meaning the user attaches to an FALSE isValid code: see Details)

method

A function for density estimation. See Description for the default behaviour and Details for the constraints on input and output of the function.

...

further arguments passed to or from other methods (currently not used).

Details

By default, density estimation is based on Rmixmod methods. Other available methods are not routinely used and not all of Infusion features may work with them. The function mixmodCluster is called, with arguments nbCluster=Infusion.getOption("nbCluster") and mixmodGaussianModel=Infusion.getOption("mixmodGaussianModel"). If Infusion.getOption("nbCluster") specifies a sequence of values, then several clusterings are computed and AIC is used to select among them.

infer_logL_by_GLMM, infer_logL_by_Rmixmod, infer_logL_by_mclust, and infer_logL_by_Hlscv.diag are examples of the method that may be provided for density estimation. Other methods may be provided with the same arguments. Their return value must include the element logL, an estimate of the log-density of stat.obs, and the element isValid with values FALSE/TRUE (or 0/1). The standard format for the return value is unlist(c(attr(EDF,"par"),logL,isValid=isValid)).

isValid is primarily intended to indicate whether the log likelihood of stat.obs inferred by a given density estimation method was suitable input for inference of the likelihood surface. isValid has two effects: to distinguish points for which isValid is FALSE in the plot produced by plot.SLik; and more critically, to control the sampling of new parameter points within refine so that points for which isValid is FALSE are less likely to be sampled.

Invalid values may for example indicate a likelihood estimated as zero (since log(0) is not suitable input), or (for density estimation methods which may infer erroneously large values when extrapolating), whether stat.obs is within the convex hull of the EDF. In user-defined methods, invalid inferred logL should be replaced by some alternative low estimate, as all methods included in the package do.

The source code of infer_logL_by_Hlscv.diag illustrates how to test whether stat.obs is within the convex hull of the EDF, using functions resetCHull and isPointInCHull (exported from the blackbox package).

infer_logL_by_Rmixmod calls mixmodCluster, infer_logL_by_mclust calls densityMclust, infer_logL_by_Hlscv.diag calls kde, and infer_logL_by_GLMM fits a binned distribution of summary statistics using a Poisson GLMM with autocorrelated random effects, where the binning is based on a tesselation of a volume containing the whole simulated distribution. Limited experimentations so far suggest that the mixture models methods are fast and appropriate (Rmixmod, being a bit faster, is the default method); that the kernel smoothing method is more erratic and moreover requires additional input from the user, hence is not really applicable, for distributions in dimension d= 4 or above; and that the GLMM method is a very good density estimator for d=2 but will challenge one's patience for d=3 and further challenge the computer's memory for d=4.

Value

For infer_logLs, a data frame containing parameter values and their log likelihoods, and additional information such as attributes providing information about the parameter names and statistics names (not detailed here). These attributes are essential for further inferences.

See Details for the required value of the methods called by infer_logLs.

See Also

See step (3) of the workflow in the Example on the main Infusion documentation page.