Use Pattern Oriented Modeling to fit unknown parameters

Description

This is very much in alpha condition. It has been tested on simple problems, as shown in the examples, with up to 2 parameters. It appears that DEoptim is the superior package for the stochastic problems. This should be used with caution as with all optimization routines. This function can nevertheless take optim or genoud as optimizers, using stats::optim or rgenoud::genoud, respectively. However, these latter approaches do not seem appropriate for stochastic problems, and have not been widely tested and are not supported within POM.

Usage

1
2
3
4
5
6
7
8
POM(sim, params, objects, objFn, cl, optimizer = "DEoptim", sterr = FALSE,
  ..., objFnCompare = "MAD", optimControl = NULL, NaNRetries = NA,
  logObjFnVals = FALSE, weights)

## S4 method for signature 'simList,character'
POM(sim, params, objects, objFn, cl,
  optimizer = "DEoptim", sterr = FALSE, ..., objFnCompare = "MAD",
  optimControl = NULL, NaNRetries = NA, logObjFnVals = FALSE, weights)

Arguments

sim

A simList simulation object, generally produced by simInit.

params

Character vector of parameter names that can be changed by the optimizer. These must be accessible with params(sim) internally.

objects

A optional named list (must be specified if objFn is not). The names of each list element must correspond to an object in the .GlobalEnv and the list elements must be objects or functions of objects that can be accessed in the ls(sim) internally. These will be used to create the objective function passed to the optimizer. See details and examples.

objFn

An optional objective function to be passed into optimizer. If missing, then POM will use objFnCompare and objects instead. If using POM with a SpaDES simulation, this objFn must contain a spades call internally, followed by a derivation of a value that can be minimized but the optimizer. It must have, as first argument, the values for the parameters. See example.

cl

A cluster object. Optional. This would generally be created using parallel::makeCluster or equivalent. This is an alternative way, instead of beginCluster(), to use parallelism for this function, allowing for more control over cluster use.

optimizer

The function to use to optimize. Default is "DEoptim". Currently it can also be "optim" or "rgenoud", which use stats::optim or rgenoud::genoud, respectively. The latter two do not seem optimal for stochastic problems and have not been widely tested.

sterr

Logical. If using optimizer = "optim", the hessian can be calculated. If this is TRUE, then the standard errors can be estimated using that hessian, assuming normality.

...

All objects needed in objFn

objFnCompare

Character string. Either, "MAD" or "RMSE" indicating that inside the objective function, data and prediction will be compared by Mean Absolute Deviation or Root Mean Squared Error. Default is "MAD".

optimControl

List of control arguments passed into the control of each optimization routine. Currently, only passed to DEoptim.control when optimizer is "DEoptim"

NaNRetries

Numeric. If greater than 1, then the function will retry the objective function for a total of that number of times if it results in an NaN. In general this should not be used as the objective function should be made so that it doesn't produce NaN. But, sometimes it is difficult to diagnose stochastic results.

logObjFnVals

Logical or Character string indicating a filename. Ignored if objFn is supplied. If TRUE (and there is no objFn supplied), then the value of the individual patterns will be output the console if being run interactively or to a tab delimited text file named ObjectiveFnValues.txt (or that passed by the user here) at each evaluation of the POM created objective function. See details.

weights

Numeric. If provided, this vector will be multiplied by the standardized mean absolute deviations as describe in objects. This has the effect of weighing each pattern to a user specified amount in the objective function.

Details

There are two ways to use this function, via 1) objFn or 2) objects.

  1. The user can pass the entire objective function to the objFn argument that will be passed directly to the optimizer. For this, the user will likely need to pass named objects as part of the ....

  2. The slightly simpler approach is to pass a list of 'actual data–simulated data' pairs as a named list in objects and specify how these objects should be compared via objFnCompare (whose default is Mean Absolute Deviation or "MAD").

Option 1 offers more control to the user, but may require more knowledge. Option 1 should likely contain a call to simInit(copy(simList)) and spades internally. See examples that show simple examples of each type, option 1 and option 2. In both cases, params is required to indicate which parameters can be varied in order to achieve the fit.

Currently, option 1 only exists when optimizer is "DEoptim", the default.

The upper and lower limits for parameter values are taken from the metadata in the module. Thus, if the module metadata does not define the upper and lower limits, or these are very wide, then the optimization may have troubles. Currently, there is no way to override these upper and lower limits; the module metadata should be changed if there needs to be different parameter limits for optimization.

objects is a named list of data–pattern pairs. Each of these pairs will be assessed against one another using the objFnCompare, after standardizing each independently. The standardization, which only occurs if the abs(data value < 1), is: mean(abs(derived value - data value))/mean(data value). If the data value is between -1 and 1, then there is no standardization. If there is more than one data–pattern pair, then they will simply be added together in the objective function. This gives equal weight to each pair. If the user wishes to put different weight on each pattern, a weights vector can be provided. This will be used to multiply the standardized values described above. Alternatively, the user may wish to weight them differently, in which case, their relative scales can be adjusted.

There are many options that can be passed to DEoptim, (the details of which are in the help), using optimControl. The defaults sent from POM to DEoptim are: steptol = 3 (meaning it will start assessing convergence after 3 iterations (WHICH MAY NOT BE SUFFICIENT FOR YOUR PROBLEM), NP = 10 * length(params) (meaning the population size is 10 x the number of parameters) and itermax = 200 (meaning it won't go past 200 iterations). These and others may need to be adjusted to obtain good values. NOTE: DEoptim does not provide a direct estimate of confidence intervals. Also, convergence may be unreliable, and may occur because itermax is reached. Even when convergence is indicated, the estimates are not guaranteed to be global optima. This is different than other optimizers that will normally indicate if convergence was not achieved at termination of the optimization.

Using this function with a parallel cluster currently requires that you pass optimControl = list(parallelType = 1), and possibly package and variable names (and does not yet accept the cl argument). See examples. This setting will use all available threads on your computer. Future versions of this will allow passing of a custom cluster object via cl argument. POM will automatically determine packages to load in the spawned cluster (via SpaDES::packages) and it will load all objects in the cluster that are necessary, by sending names(objects) to parVar in DEoptim.control.

Setting logObjFnVals to TRUE may help diagnosing some problems. Using the POM derived objective function, essentially all patterns are treated equally. This may not give the correct behavior for the objective function. Because the POM weighs the patterns equally, it may be useful to set all patterns so that their expected values are each close to the same value, e.g., 1. If they are all expected to be near that value, then their sum will give relatively equal weights to each of the patterns. This can, of course, be manipulated to give higher weight to some patterns over others, by setting their expected value to be scaled to 1, then multiplied by some weight. This would have to be done in the preparation of the patterns with the objects argument.

Value

A list with at least 2 elements. The first (or first several) will be the returned object from the optimizer. The second (or last if there are more than 2), named args is the set of arguments that were passed into the control of the optimizer.

Author(s)

Eliot McIntire

See Also

spades, makeCluster, simInit

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
if (interactive()) {
  set.seed(89462)
  library(parallel)
  mySim <- simInit(
    times = list(start = 0.0, end = 2.0, timeunit = "year"),
    params = list(
      .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"),
      fireSpread = list(nFires = 5),
      randomLandscapes = list(nx = 300, ny = 300)
    ),
    modules = list("randomLandscapes", "fireSpread", "caribouMovement"),
    paths = list(modulePath = system.file("sampleModules", package = "SpaDES"))
  )

  # Since this is a made up example, we don't have real data
  #  to run POM against. Instead, we will run the model once,
  #  take the values at the end of the simulation as if they
  #  are real data, then rerun the POM function next,
  #  comparing these "data" with the simulated values
  #  using Mean Absolute Deviation
  outData <- spades(copy(mySim), .plotInitialTime = NA)

  # Extract the "true" data, in this case, the "proportion of cells burned"
  # Function defined that will use landscape$Fires map from simList,
  #  i.e., sim$landscape$Fires
  #  the return value being compared via MAD with propCellBurnedData
  propCellBurnedFn <- function(landscape) {
    sum(getValues(landscape$Fires)>0) / ncell(landscape$Fires)
  }
  # visualize the burned maps of true "data"
  propCellBurnedData <- propCellBurnedFn(outData$landscape)
  clearPlot()
  if(interactive()) {
    Fires <- outData$landscape$Fires # Plot doesn't do well with many nested layers
    Plot(Fires)
  }
  # Example 1 - 1 parameter
  # In words, this says, "find the best value of spreadprob such that
  #  the proportion of the area burned in the simulation
  #  is as close as possible to the proportion area burned in
  #  the "data", using \code{DEoptim()}.

  # Can use cluster if computer is multi-threaded (but not yet via cl arg, which is not
  #                                                implemented yet in DEoptim)
  # This example uses parallelType = 1 in DEoptim. For this, you must manually
  #  pass all packages and variables as character strings.
  # cl <- makeCluster(detectCores() - 1) # not implemented yet in DEoptim
  out1 <- POM(mySim, "spreadprob",
              list(propCellBurnedData = propCellBurnedFn), # data = pattern pair
              #optimControl = list(parallelType = 1),
              logObjFnVals = TRUE)

  ## Once cl arg is available from DEoptim, this will work:
  # out1 <- POM(mySim, "spreadprob", cl = cl,
  #            list(propCellBurnedData = propCellBurnedFn)) # data = pattern pair

  # Example 2 - 2 parameters
  # Function defined that will use caribou from sim$caribou, with
  #  the return value being compared via MAD with NPattern
  #  module, parameter N, is from 10 to 1000)
  caribouFn <- function(caribou) length(caribou)

  # Extract "data" from simList object (normally, this would be actual data)
  NPattern <- caribouFn(outData$caribou)

  aTime <- Sys.time()
  parsToVary <- c("spreadprob", "N")
  out2 <- POM(mySim, parsToVary,
              list(propCellBurnedData = propCellBurnedFn,
                   NPattern = caribouFn), logObjFnVals = TRUE)
                   #optimControl = list(parallelType = 1))
                   #cl = cl) # not yet implemented, waiting for DEoptim
  bTime <- Sys.time()
  # check that population overlaps known values (0.225 and 100)
  apply(out2$member$pop, 2, quantile, c(0.025, 0.975))
  hists <- apply(out2$member$pop, 2, hist, plot = FALSE)
  clearPlot()
  for (i in seq_along(hists)) Plot(hists[[i]], addTo = parsToVary[i],
                                   title = parsToVary[i], axes = TRUE)

  print(paste("DEoptim", format(bTime - aTime)))
  #stopCluster(cl) # not yet implemented, waiting for DEoptim

  # Example 3 - using objFn instead of objects

  # list all the parameters in the simList, from these, we select to vary
  params(mySim)

  # Objective Function Example:
  #   objective function must have several elements
  #   - first argument must be parameter vector, passed to and used by DEoptim
  #   - likely needs to take sim object, likely needs a copy
  #      because of pass-by-reference semantics of sim objects
  #   - pass data that will be used internally for objective function
  objFnEx <- function(pars, # param values
                      sim, # simList object
                      NPattern, propCellBurnedData, caribouFn, propCellBurnedFn) { # data

    # make a copy of simList because it will possibly be altered by spades call
    sim1 <- SpaDES::copy(sim)

    # take the parameters and assign them to simList
    params(sim1)$fireSpread$spreadprob <- pars[1]
    params(sim1)$caribouMovement$N <- pars[2]

    # run spades, without plotting
    out <- spades(sim1, .plotInitialTime = NA)

    # calculate outputs
    propCellBurnedOut <- propCellBurnedFn(out$landscape)
    NPattern_Out <- caribouFn(out$caribou)

    minimizeFn <- abs(NPattern_Out - NPattern) +
                  abs(propCellBurnedOut - propCellBurnedData)

    # have more info reported to console, if desired
    # cat(minimizeFn)
    # cat(" ")
    # cat(pars)
    # cat("\n")

    return(minimizeFn)
  }

  # Run DEoptim with custom objFn, identifying 2 parameters to allow
  #  to vary, and pass all necessary objects required for the
  #  objFn

  # choose 2 of them to vary. Need to identify them in params & inside objFn
  # Change optimization parameters to alter how convergence is achieved
  out5 <- POM(mySim, params = c("spreadprob", "N"),
              objFn = objFnEx,
              NPattern = NPattern,
              propCellBurnedData = propCellBurnedData,
              caribouFn = caribouFn,
              propCellBurnedFn = propCellBurnedFn,
              #cl = cl, # uncomment for cluster # not yet implemented, waiting for DEoptim
              # see ?DEoptim.control for explanation of these options
              optimControl = list(
                NP = 100, # run 100 populations, allowing quantiles to be calculated
                initialpop = matrix(c(runif(100, 0.2, 0.24), runif(100, 80, 120)), ncol = 2),
                parallelType = 1
              )
            )

  # Can also use an optimizer directly -- miss automatic parameter bounds,
  #  and automatic objective function using option 2
  library(DEoptim)
  out7 <- DEoptim(fn = objFnEx,
                 sim = mySim,
                 NPattern = NPattern,
                 propCellBurnedData = propCellBurnedData,
                 caribouFn = caribouFn,
                 propCellBurnedFn = propCellBurnedFn,
                 # cl = cl, # uncomment for cluster
                 # see ?DEoptim.control for explanation of these options
                 control = DEoptim.control(steptol = 3, # parallelType = 3,
                                           parallelType = 1,
                                           packages = list("raster", "SpaDES", "RColorBrewer"),
                                           parVar = list("objFnEx")),
                 initialpop = matrix(c(runif(40, 0.2, 0.24), runif(40, 80, 120)), ncol = 2),
                 lower = c(0.2, 80), upper = c(0.24, 120))
}

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.