A compendium of ways to make
secr.fit run faster.
Use an appropriate mask
Check the extent and spacing of the habitat mask that you are using.
Execution time is roughly proportional to the number of mask points
nrow(mymask)). Default settings can lead to very large masks
for detector arrays that are elongated ‘north-south’ because the number
of points in the east-west direction is fixed. Compare results with a
much sparser mask (e.g., nx = 32 instead of nx = 64).
Use conditional likelihood
If you don't need to model variation in density over space or time then consider maximizing the conditional likelihood in secr.fit (CL = TRUE). This reduces the complexity of the optimization problem, especially where there are several sessions and you want session-specific density estimates (by default, derived() returns a separate estimate for each session even if the detection parameters are constant across sessions).
Do you really need to fit all those complex models? Chasing down small decrements in AIC is so last-century. Remember that detection parameters are mostly nuisance parameters, and models with big differences in AIC may barely differ in their density estimates. This is a good topic for further research - we seem to need a ‘focussed information criterion’ (Claeskens and Hjort 2008) to discern the differences that matter. Be aware of the effects that can really make a difference: learned responses (b, bk etc.) and massive unmodelled heterogeneity.
Use score.test() to compare nested models. At each stage this requires only the more simple model to have been fitted in full; further processing is required to obtain a numerical estimate of the gradient of the likelihood surface for the more complex model, but this is much faster than maximizing the likelihood. The tradeoff is that the score test is only approximate, and you may want to later verify the results using a full AIC comparison.
Break problem down
Suppose you are fitting models to multiple separate datasets that fit the general description of ‘sessions’. If you are fitting separate detection parameters to each session (i.e., you do not need to pool detection information), and you are not modelling trend in density across sessions, then it is much quicker to fit each session separately than to try to do it all at once. See Examples.
Mash replicated clusters of detectors
If your detectors are arranged in similar clusters (e.g., small square
grids) then try the function
Reduce sparse ‘proximity’ data to ‘multi’
Full data from ‘proximity’ detectors has dimensions n x S x K (n is
number of individuals, S is number of occasions, K is number of
traps). If the data are sparse (i.e. multiple detections of an
individual on one occasion are rare) then it is efficient to treat
proximity data as multi-catch data (dimension n x S, maximum of one
detection per occasion). Use
reduce(proxCH, outputdetector =
Use multiple cores when applicable
Some computations can be run in parallel on multiple processors (most desktops these days have multiple cores), but capability is limited. Check the ‘ncores’ argument of sim.secr() and secr.fit() and ?ncores. The speed gain is significant for parametric bootstrap computations in sim.secr. Parallelisation is also allowed for the session likelihood components of a multi-session model in secr.fit(), but gains there seem to be small or negative.
par.secr.fit are an alternative and more effective way to
take advantage of multiple cores when fitting several models.
Avoid covariates with many levels
Categorical (factor) covariates with many levels and continuous covariates that take many values are not handled efficiently in secr.fit, and can dramatically slow down analyses and increase memory requirements.
Model fitting is not needed to assess power. The precision of estimates
from secr.fit can be predicted without laboriously fitting models to
simulated datasets. Just use
method = "none" to obtain the asymptotic
variance at the known parameter values for which data have been
simulated (e.g. with sim.capthist()).
Suppress computation of standard errors by derived(). For a model fitted by conditional likelihood (CL = TRUE) the subsequent computation of derived density estimates can take appreciable time. If variances are not needed (e.g., when the aim is to predict the bias of the estimator across a large number of simulations) it is efficient to set se.D = FALSE in derived().
It is tempting to save a list with the entire ‘secr’ object from each simulated fit, and to later extract summary statistics as needed. Be aware that with large simulations the overheads associated with storage of the list can become very large. The solution is to anticipate the summary statistics you will want and save only these.
Claeskens, G. and Hjort N. L. (2008) Model Selection and Model Averaging. Cambridge: Cambridge University Press.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## Not run: ## compare timing of combined model with separate single-session models ## for 5-session ovenbird mistnetting data: 2977/78 = 38-fold difference system.time(fit1 <- secr.fit(ovenCH, buffer = 300, model = list(D ~ session, g0 ~ session, sigma ~ session))) ## user system elapsed ## 2470.99 20.62 2502.11 system.time(fit2 <- lapply (ovenCH, secr.fit, buffer = 300)) ## user system elapsed ## 66.05 0.19 66.34 ## ratio of density estimates collate(fit1)[,1,1,"D"] / sapply(fit2, function(x) predict(x)["D","estimate"]) ## session=2005 session=2006 session=2007 session=2008 session=2009 ## 1.0000198 1.0000603 0.9999761 0.9999737 0.9999539 ## End(Not run)