refine | R Documentation |
refine
is a generic function with methods for objects of the classes produced by MSL
. In the up-to-date workflow, it can automatically (1) define new parameters points, (2) add simulations to the reference table for these points, (3) optionally recompute projections, (4) update the inference of the likelihood surface, and (5) provides new point estimates, confidence intervals, and other results of an MSL
call. It can repeat these steps iteratively as controlled by its workflow_design
.
Although it has many control arguments, few of them may be needed in any application. In particular it is designed to use reasonable default controls for the number of iterations, the number of points added in each iteration, and whether to update projections or not, when given only the current fit object as input.
reproject
and recluster
are wrappers for refine(..., ntot=0L)
, updating the object after either recomputing the projections or only re-performing the multivariate gaussian mixture clustering.
## S3 method for class 'SLik'
refine(object, method=NULL, ...)
## Default S3 method:
refine(
object,
## reference table simulations
Simulate = attr(surfaceData,"Simulate"),
control.Simulate = attr(surfaceData,"control.Simulate"),
newsimuls = NULL,
## CIs
CIs = workflow_design$reftable_sizes[useCI],
useCI = prod(dim(object$logLs))<12000L, level = 0.95,
## workflow design
workflow_design = get_workflow_design(
npar=length(fittedPars), n_proj_stats = length(statNames),
n_latent=length(latentVars)),
maxit, ntot= maxit*.get_size_first_iter(object), n=NULL,
## termination conditions
precision = Infusion.getOption("precision"),
eval_RMSEs = workflow_design$reftable_sizes,
## verbosity
verbose = list(notable=TRUE, most=interactive(),final=NULL, movie=FALSE,
proj=FALSE, rparam=NULL, progress_bars=interactive()),
## projection controls
update_projectors = NULL,
methodArgs = list(),
## Likelihood surface modeling (up-to-date workflow)
using = object$using,
nbCluster = quote(refine_nbCluster(nr=nrow(data))),
## parallelisation
cluster_args = list(), nb_cores=NULL, env=get_from(object,"env"),
packages = get_from(object,"packages"), cl_seed=.update_seed(object),
## obscure stuff
target_LR = NULL,
## not explicitly needed in up-to-date workflow
trypoints = NULL,
surfaceData,
method,
useEI = list(max=TRUE,profileCI=TRUE,rawCI=FALSE),
rparamFn = Infusion.getOption("rparamFn"),
##
...
)
reproject(object, eval_RMSEs = NULL, CIs = NULL, ...)
recluster(object, eval_RMSEs = NULL, CIs = NULL, update_projectors=FALSE, ...)
object |
an |
## reference table simulations
Simulate |
Character string: name of the function used to simulate samples. As it is typically stored in the object this argument does not need to be explicitly given; otherwise this should be the same function provided to |
control.Simulate |
A list of arguments of the |
newsimuls |
For the For other methods, a |
## CIs
CIs |
Boolean, or boolean vector, or numeric (preferably integer) vector: controls to infer bounds of (one-dimensional, profile) confidence intervals. The numeric vector form allows to specify reference table size(s) for which CIs should be computed when these sizes are first reached. TRUE or FALSE will force or inhibit computation in all iterations. Finally (and probably less useful), a boolean vector such as The default for |
useCI |
whether to perform RMSE computations for inferred confidence interval points. |
level |
Intended coverage of confidence intervals |
## workflow design
workflow_design |
A list structured as the return value of |
maxit |
Maximum number of iterative refinements (see also |
ntot |
NULL or numeric: control of the total number of simulated samples (one for each new parameter point) to be added to the reference table over the
|
n |
NULL or numeric, for a number of parameter points (excluding replicates and confidence interval points in the primitive workflow), whose likelihood should be computed in each iteration
(see |
## termination conditions
precision |
Requested local precision of surface estimation, in terms of prediction standard errors (RMSEs) of both the maximum summary log-likelihood and the likelihood ratio at any CI bound available. Iterations will stop when either |
eval_RMSEs |
Same usage as for |
## verbosity
verbose |
A list as shown by the default, or simply a vector of booleans. |
## projection controls
update_projectors |
Same usage as for |
methodArgs |
A list of arguments for the projection method. By default the |
## Likelihood surface modeling
using |
Passed to |
nbCluster |
Passed to |
## parallelisation
cluster_args |
A list of arguments for |
nb_cores |
Integer: shortcut for specifying |
packages |
NULL or a list with possible elements |
env |
An environment, passed as the |
cl_seed |
NULL or integer, passed to |
## others
target_LR |
Likelihood ratio threshold used to control the sampling of new points and the selection of points for projections. Do not change it unless you known what you are doing. |
method |
For the primitive workflow: (a vector of) suggested method(s) for estimation of smoothing parameters (see |
trypoints |
A data frame of parameters on which the simulation function |
useEI |
for the primitive workflow only: cf this argument in |
surfaceData |
for the primitive workflow only: a data.frame with attributes, usually taken from the |
rparamFn |
Function used to sample new parameter values. |
... |
further arguments passed to or from other methods. |
*
Controls of exploration of parameter space: New parameter points are sampled so as to fill the space of parameters contained in the confidence regions defined by the level
argument, and to surround it by a region sampled proportionally to likelihood.
Each refine
call performs several iterations, these iterations stopping when ntot
points have been added to the simulation table. The target number of points potentially added in each iteration is controlled by the ntot
and maxit
arguments as described below, but fewer points may be actually added in each iteration, and more than maxit
iterations may be needed to add the ntot
points, if in a given iteration too few “good” candidate points are generated according to the internal rules for sampling the parameter region with high likelihood. In that case, the next iteration tries to keep up with the missing points by adding more points than the target number, but if not enough points have been added after maxit
iterations, further iterations will be run.
CIs and RMSEs may be computed in any iteration but the default values of eval_RMSEs
and CIs
are chosen so as to avoid performing these computations too often, particularly when they are expected to be slow. The default implies that the RMSE for the maximum logL will be computed at the end each block of iterations that defines a refine (itself defined to reach to reference table sizes specified by the workflow_design
and its default value). If the reference table is not too large (see default value of useCI
for the precise condition), RMSEs of the logL are also computed at the inferred bounds of profile-based confidence intervals for each parameter.
Although the update_projectors
argument allow similar control of the iterations where projections are updated, it is advised to keep it NULL (default value), so that whether projectors are updated in a given iteration is controlled by default internal rules. Setting it to TRUE
would induce updating whenever any of the target reference table sizes implied by the workflow_design$subblock_sizes
is reached. The default NULL
, as the same effect subject to additional conditions: updating may not be performed when the training set is considered too similar to the one used to compute pre-existing projections, or when the train set includes more samples than the limit define by the global package option upd_proj_subrows_thr
Default values of ntot
and maxit
are controlled by the value of the workflow_design
, which itself has the shown default value, and are distinct for the first vs. subsequent refine
s.
The target number of points in each iteration is also controlled differently for the first vs. subsequent refine
s. This design is motivated by the fact that the likelihood surface is typically poorly inferred in the first refine so that the parameter points sampled then tend to be less relevant than those that can be sampled in later iterations. In the first refine
call, the target number of points increases roughly as powers of two over iterations, to reach ntot
cumulatively after maxit
iterations. The default ntot
is twice the size of the initial reference table, and the default maxit
is 5. The example_reftable
Example illustrates this, where the initial reference table holds 200 simulations, and the default target number of points to be added in 5 iterations by the first refine
call are 25, 25, 50, 100 and 200. In later refine
calls, the target number is ntot/maxit
in each iteration.
*
Independent control of parallelisation may be needed in the different steps, e.g. if the simulations are not easily parallelised whereas the projection method natively handles parallelisation. In the up-to-date workflow with default ranger
projection method, distinct parallelisation controls may be passed to add_reftable
for sample simulations, to project
methods when projections are updated, and to MSL
for RMSE computations (alternatively for the primitive workflow, add_simulation
, infer_logLs
and MSL
are called).
The most explicit way of specifying distinct controls is by a list structured as
cluster_args=list(reftable=list(<makeCluster arguments>), RMSEs=list(<makeCluster arguments>))
A project=list(num.threads=<.>)
element can be added to this list, providing control of the num.threads
argument of ranger functions. However, this is retained mainly for back compatibility as the methodArgs
argument can now be used to specify the num.threads
.
Simpler arguments may be used and will be interpreted as follows: nb_cores
, if given and not overriden by a spec
argument in cluster_args
(or in sublists of it), will control simulation and projection steps (but not RMSE computation): that is, nb_cores
then gives the number of parallel processes for sample simulation, with additional makeCluster
arguments taken from cluster_args
, but RMSE computations are performed serially. On the other hand, a spec
argument in
cluster_args=list(spec=<.>, <other makeCluster arguments>))
will instead apply the same arguments to both reference table and RMSE computation, overcoming the default effect of nb_cores
in both of them.
refine
returns an updated SLik
or SLik_j
object, unless both newsimuls
and Simulate
arguments are NULL, in which case a data frame of parameter points is returned.
See workflow examples in (by order of decreasing relevance) example_reftable
, example_raw_proj
and example_raw
.
See get_workflow_design
, the function that controls
the default value of the workflow_design
argument, and can be used to provide non-default controls.
## see Note for links to examples.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.