anneal: Perform Simulated Annealing for Maximum Likelihood Estimation

View source: R/anneal.R

annealR Documentation

Perform Simulated Annealing for Maximum Likelihood Estimation

Description

Performs simulated annealing - a global optimization algorithm - for maximum likelihood estimation of model parameters. Bounded, unbounded, and mixed searches can all be performed. See the Simulated Annealing Algorithm help page for more on how simulated annealing is actually performed.

Usage

anneal(model, par, var, source_data, par_lo = NULL, par_hi = NULL, pdf, 
dep_var, initial_temp = 3, temp_red = 0.95, ns = 20, nt = 100, 
max_iter = 50000, min_change = 0, min_drops = 100, hessian = TRUE, 
delta = 100, slimit = 2, c = 2, note = "", show_display = TRUE, ...)

Arguments

model

Scientific model for whose parameters anneal will find maximum likelihood estimates. This is an R function.

par

List object of parameters for which to find maximum likelihood estimates using simulated annealing. The name of each component in par matches the name of an argument in one of the functions passed to anneal (either model, pdf, or any other function that you pass in). The value of each component is the initial value. All components in par must be numeric vectors. Vectors of length greater than one have each of their elements treated separately as individual parameters to estimate.

var

List object with the source for all other arguments and data used by model, pdf, and any other functions.

source_data

Data frame containing any needed source data. You can reference the data frame columns by name to anneal.

par_lo

List object with the lower search bounds for each parameter to estimate. The list component names and sizes should each match a component in par. Any individual component (up to and including the entire par_lo argument) is optional. For any component of par that is omitted, the lower search bound for that parameter is assumed to be negative infinity. (Infinity isn't quite infinity - see details section for more.)

par_hi

List object with the upper search bounds for each parameter to estimate. The list component names and sizes should each match a component in par. Any individual component (up to and including the entire par_hi argument) is optional. For any component of par that is omitted, the upper search bound for that parameter is assumed to be infinity. (Infinity isn't quite infinity - see details section for more.)

pdf

Probability density function to use in likelihood calculations. anneal depends on a log likelihood value, so you must instruct pdf to calculate the log of its result. This is an option with all of R's built-in PDFs.

dep_var

The name of the column in source_data, as a string, that contains the dependent variable (the “observed” value).

initial_temp

The temperature at which to start the annealing process.

temp_red

The rate of temperature reduction (a fractional number less than 1).

ns

Number of iterations between changes in parameter search ranges. One iteration varies all parameters one time.

nt

Controls number of iterations between drops in temperature. Temperature drops occur at nt * ns iterations. One iteration varies all parameters one time.

max_iter

Maximum number of iterations to perform. One iteration varies all parameters one time.

min_change

An alternate (and optional) way to specify quitting conditions for the run. This is the minimum amount of change in likelihood in min_drop number of temperature drops. If the change is less than min_change, execution stops even if max_iter number of iterations have not been performed.

min_drops

The companion to min_change for alternate quitting conditions. This is the number of temperature drops over which the likelihood must have changed more than min_change for execution to continue.

hessian

if TRUE, the Hessian matrix is used to calculate the standard error for each parameter and the parameter variance-covariance matrix. These are included in the output. If FALSE, this step is skipped.

delta

The number by which to divide each parameter maximum likelihood estimate value when searching for support limits. The bigger the number, the finer the search. See support_limits for more on how support limits are calculated.

slimit

When calculating support limits for the parameter maximum likelihood estimates, this is the number of likelihood units less than the optimum likelihood for which to search the parameter ranges. 2 units is standard. 1.92 units corresponds roughly to a 95 percent confidence interval.

c

Controls the reduction in parameter search range. A value of 0 would keep the search range permanently between the values set in par_lo and par_hi. A higher value will restrict the search more when range adjustments are made. A value of 2 is recommended by Goffe.

note

A note about the run. This can be any character string. This will be written to output files by write_results.

show_display

Whether or not to show the progress display.

...

Any other data needed by model, pdf, or any other function to be called by anneal. This is an alternative to providing the data in var; however, passing all values in var is strongly recommended.

Details

Simulated annealing is a search algorithm that attempts to find the global maximum of the likelihood surface produced by all possible values of the parameters being estimated. The value of the maximum that anneal finds is the maximum likelihood value, and the value of the parameters that produced it are their maximum likelihood estimates. See the Simulated Annealing Algorithm page for details on how the search is performed. See the Likelihood Calculation page for details on how likelihood is calculated. Simulated annealing is an algorithm that can search any function; but anneal specifically searches likelihood.

The model function is the scientific model, which generally takes as arguments the parameters for which to estimate maximum likelihood. It returns a predicted value of the dependent variable for each record in the source_data dataset, which is compared to the actual (observed) value when likelihood is calculated. Write model so that it returns a vector of numeric values, one for each record in the dataset.

The probability density function calculates the likelihood using the predicted and observed values of the dependent variable. You can provide your own function, but R has many built-in functions that you can use. You can read more about R's probability density functions in the help file “An Introduction to R”, but here is a brief list: dbeta (beta), dexp (exponential), dgamma (gamma), dlnorm (lognormal), dnbinom (negative binomial), dnorm (normal), and dpois (poisson). These all take a log argument which you should set to TRUE in var in order to calculate the log likelihood. If you write your own probability density function, it should return a vector of values, one for each record in the dataset.

If you wish, some of the arguments passed to model or pdf by anneal can be the results of other functions. anneal will make sure these functions are evaluated at each search iteration.

anneal handles all function calls and data. You tell anneal how to use your functions and data using par and var. Use par to give anneal the list of parameters for which to find maximum likelihood estimates. All values must be numeric vectors. The name of each list component must match the function argument where the value should go. For example, if your model function takes an argument called “a”, and you want the maximum likelihood estimate for a, there should be a par$a. If any component of par is a vector of length greater than one, each value is treated as a separate parameter to estimate. This is useful if, for example, you wish to estimate a parameter that has a different value for different sites or species.

Use var to tell anneal where all other functions and data come from. var is a list, and each component's name matches the function argument it should be used for (as with par). The value can be of any data type that makes sense to the function. To indicate that the source of a function argument is a column of data from a dataset, set that value of var to the name of the data frame's column, as a character string (for example, var$dbh<-"DBH"). Case matters! You will get the best results if all function arguments and column names are unique, so that there is no ambiguity. You are also free to reference values directly from the global environment in your functions if you prefer.

The reserved character string “predicted”, used in var, means the predicted value of the dependent variable, as calculated by model.

If you want anneal to pass the results of another function as an argument to the model or pdf functions, define the function and then set the appropriate argument in var to the name of the function. Then provide all arguments to the sub-function in var as well. For instance, if your model function takes an argument called x, and you wish x to be the result of function fun1, then set var$x <- fun1, and add any arguments to fun1 to var. anneal will ensure that all functions are evaluated in the proper order.

If the likelihood is calculated as infinity or NaN (which can easily happen), the likelihood is arbitrarily set to -1000000 to preserve the ability to graph results and compare values. If your best likelihood is -1000000, it is possible that no valid likelihood value was found.

The search ranges for parameters can be set to (or allowed to default to) negative and positive infinity. In practice, the search is bounded by the largest and smallest values the computer can work with. To find out what the actual limits are on your computer, use .Machine$double.xmax.

When looking at the examples provided in the demos that come with this package, check those for likeli as well, since the parameter setup techniques are the same.

Value

A list object with information on the annealing run. If you stop the run by pressing Esc, you will get this data structure with the results of the run at the point where you stopped it.

best_pars

The maximum likelihood estimates for each value in par.

var

A copy of the var argument, to help you keep track of your analysis. To save space, any data frames are removed.

source_data

A copy of the source_data data frame, with a column added for the predicted values calculated by model using the maximum likelihood estimates of the parameters.

pdf

The name of the pdf function.

model

The name of the model function.

iterations

The number of annealing iterations completed. One iteration varies all parameters one time. If the run does not complete, this may not be an integer.

max_likeli

The maximum likelihood value found.

aic_corr

The value of Akaike's Information Criterion, “corrected” for small sample size. See the Simulated Annealing Algorithm help page for more.

aic

The value of Akaike's Information Criterion. See the Simulated Annealing Algorithm help page for more.

slope

Slope of observed values linearly regressed on those predicted by model, using the parameter maximum likelihood estimates. The intercept is forced at zero.

R2

Proportion of variance explained by the model relative to that explained by the simple mean of the data.

likeli_hist

Data frame with the history of likelihood change throughout the run. All changes in likelihood are recorded, along with regular periodic checkpoints. The columns are: “temp”, the temperature at that point, “iter”, the number of iterations completed, and “likeli”, the maximum likelihood value.

par_lo

List object with the lower bounds for each of the parameters. If any value was omitted in the original arguments, it is recorded here as a value that approximates negative infinity.

par_hi

List object with upper bounds for varying parameters. If any value was omitted in the original arguments, it is recorded here as a value that approximates infinity.

par_step

List object with final size of the search range for each parameter.

note

The value of the note argument, above.

upper_limits

List object with upper support limits for each parameter. For more on support limits, see the support_limits function.

lower_limits

List object with lower support limits for each parameters. For more on support limits, see the support_limits function.

std_errs

If anneal was run with hessian = TRUE, this is a list object with the standard errors for each parameter.

var_covar_mat

If anneal was run with hessian = TRUE, this is the parameter variance / covariance matrix.

References

Goffe, W.L., G.D. Ferrier, and J. Rogers. 1994. Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60:65-99.

Examples

##
## Simulated annealing to maximize log
## likelihood for the following:
## Model: Radius = a + b * DBH
## Dataset: included crown_rad dataset
## We want to use simulated annealing to
## find maximum likelihood estimates of
## the parameters "a" and "b".
##
## Not run: 
library(likelihood)

## Set up our dataset
data(crown_rad)
dataset <- crown_rad

## Create our model function
modelfun <- function (a, b, DBH) {a + b * DBH}

## Create the list for the parameters to estimate and
## set initial values for a and b
par <- list(a = 0, b = 0)

## Create a place to put all the other data needed by
## the model and PDF, and indicate that DBH comes from 
## the column marked "DBH" in the dataset
var <- list(DBH = "DBH")

## Set bounds and initial search ranges within which to search for parameters
par_lo <- list(a = 0, b = 0)
par_hi <- list(a = 50, b = 50)

## We'll use the normal probability density function -
## add the options for it to our parameter list
## "x" value in PDF is observed value
var$x <- "Radius"

## Mean in normal PDF
var$mean <- "predicted"
var$sd <- 0.815585

## Have it calculate log likelihood
var$log <- TRUE

results<-anneal(model = modelfun, par = par, var = var,
  source_data = dataset, par_lo = par_lo, par_hi = par_hi,
  pdf = dnorm, dep_var = "Radius", max_iter = 20000)

## Alternately: reference crown_rad$DBH directly in the function without
## using var
modelfun <- function (a, b) {a + b * crown_rad$DBH}
var <- list(x = "Radius",
            mean = "predicted",
            sd = 0.815585,
            log = TRUE)
results<-anneal(model = modelfun, par = par, var = var,
  source_data = dataset, par_lo = par_lo, par_hi = par_hi,
  pdf = dnorm, dep_var = "Radius", max_iter = 20000)

## End(Not run)  

likelihood documentation built on March 31, 2023, 9:02 p.m.