R/e2e_optimize_eco.R
In StrathE2E2: End-to-End Marine Food Web Model

Documented in e2e_optimize_eco

#
# e2e_optimize_eco.R
#
#' Optimize StrathE2E ecology model parameters to maximise the likelihood of observed ecosystem target data.
#'
#' Launches a StrathE2E simulated annealing process to find the set of ecology model parameters
#' producing the maximum likelihood of observed target data on the state of the ecosystem,
#' given specified environmental driving data and fishing fleet parameters.
#'
#' Simulated annealing is is a probabilistic technique for approximating the global optimum of a given function. As implemented here the process searches the parameter space
#' of a model to locate the combination which maximises the likelihood of a set of observed
#' data corresponding to a suite of derived outputs. Parameter combinations which result in an improved likelihood
#' may be rejected according to a probability ('temperature') which decreases as the iterations progress. This is to  
#' avoid becoming stuck at local likelihood-maxima. The rate at which the 'temperature' decreases is set
#' by a 'cooling' parameter (fraction of previous temperature at each iteration, 0<value<1).
#'
#' Model configuration and initial values of the ecology model parameters need to be
#' assembled by a prior call of the e2e_read() function.
#' 
#' NOTE that the models.path argument in the e2e_read() function call needs to point to a user workspace folder, not the default
#' North Sea model provided with the package. This is because the annealing function needs write-access to the model /Param folder,
#' but the /extdata/Models folder in the package installation is read-only.
#' To use the annealing function on the North Sea model, use the e2e_copy() function to make a copy of the
#' North Sea model in the user workspace.
#'
#' The observational data to which the ecology parameters are optimized are loaded from the folder Modelname/Variantname/Target/annual_observed_*.csv as part of a e2e_read() function call and are built into the R-list object generated by e2e_read().
#' Column 3 of annual_observed_* (header: "Use1_0") is a flag to set whether any given row is used in calculating the likelihood of the observed data given the model setup and parameters. Un-used rows of data are omitted from calculations.
#'
#' The coefficients of variation for jiggling the ecology parameter can be varied in real-time
#' during the run by editing the file "optimize_ecology.csv" in the folder
#' /Param/control/ of the model version.
#'
#' The function produces a real-time graphical summary of the progress of the fitting procedure, displaying
#' the likelihoods of the proposed and accepted parameter sets at each iteration.
#' y-axis (likelihood of the target data) range of the real time plot can be varied during the run by 
#' editing the setup file "optimize_ecology.csv"
#'
#' At the end of the procedure, provided that csv.output=TRUE, new versions of the three ecology model 'fitted_parameters..' files are exported
#' to the folder /Param of the model version, with a user defined identifier specified by the model.ident argument
#' in the e2e_read() function. These data are also saved in the list object returned by the function.
#'
#' In order to use the new fitted parameter values in a subsequent run of the StrathE2E model (using the e2e_run() function)
#' it will be necessary to edit the MODEL_SETUP.csv file in the relevant /Models/variant folder to point to the new files.
#'
#' Also at the end of the procedure the histories of proposed and accepted ecology model parameter values and corresponding likleihoods from each iteration of the procedure are saved as CSV files 
#' in the results folder (provided that the argument csv.output=TRUE), and in a list object which is returned by the function. 
#' The two csv files generated by the procedure have names:  
#' annealing_par_proposalhistory-*, annealing_par_acceptedhistory-*, where * denotes the value
#' of model.ident defined in the preceding e2e_read() function call.
#' The returned list object contains three dataframes: parameter_proposal_history, parameter_accepted_history, new_parameter_data (a list of three).
#' The proposal and accepted histories can be further analysed with the function e2e_plot_opt_diagnostics() to assess the performance of the optimization process.
#'
#' @param model R-list object generated by the e2e_read() function which defined the model configuration.
#' @param nyears Number of years to run the model in each iteration (default=40).
#' @param n_iter Number of iterations of the model (default=500).
#' @param start_temperature Initial value of the simulated annealing temperature parameter (default=1). Suggested values in the range 0.0005 - 5. Higher values increase the probability of rejecting parameter combinations producing an improvement in likelihood.
#' @param cooling Rate at which the simulated annealing temperature declines with iterations (default=0.975). Suggested values in the range 0.9 - 0.985
#' @param toppredlock Logical. If TRUE then locks-down the uptake parameters of the birds pinnipeds and cetaceans as these are hard to fit alongside the other parameters (default=TRUE).
#' @param quiet Logical. If TRUE then suppress informational messages at the start of each iteration (default=TRUE).
#' @param csv.output Logical. If TRUE then enable writing of csv output files (default=FALSE).
#' @param runtime.plot Logical. If FALSE then disable runtime plotting of the progress of the run - useful for testing (default=TRUE)
#'
#' @return A list object containing the histories of proposed and accepted parameters and the final accepted parameter values. Optionally (by default), csv files of the histories and the final accepted parameter values. The latter are returned to the model parameter folder in a format to be read back into the model.
#'
#' @seealso \code{\link{e2e_ls}} , \code{\link{e2e_read}} , \code{\link{e2e_plot_opt_diagnostics}} , \code{\link{e2e_optimize_hr}} , \code{\link{e2e_optimize_act}}
#'
#' @importFrom stats runif
#' @importFrom grDevices dev.off
#'
#' @export
#'
#' @examples
#' \donttest{
#' # Load the 1970-1999 version of the North Sea model supplied with the package and generate a
#' # quick test data object with only 8 itereations and running the model for only 3 years.
#' # Also, the final parameter values are not saved back to the model Param folder.
#' # More realistic would be at least 500 iterations and running for 50 years.
#' # Even so this example will take a few minutes to run:
#'     model<-e2e_read(model.name="North_Sea",
#'                     model.variant="1970-1999",
#'                     model.ident="test")
#' # This model is already optimized to the observed ecosystem data supplied with the package
#' # so to illustrate the performance of the process we perturb the temperature driving to knock
#' # the model away from its maximum likelihood state relative to the target data:
#' # add 3 degC to upper layer offshore temperatures:
#'     model$data$physics.drivers$so_temp <- model$data$physics.drivers$so_temp+3
#' # add 3 degC to inshore temperatures:
#'     model$data$physics.drivers$si_temp <- model$data$physics.drivers$si_temp+3
#' # add 3 degC to lower layer offshore temperatures:
#'     model$data$physics.drivers$d_temp  <- model$data$physics.drivers$d_temp+3
#'     test_run  <- e2e_optimize_eco(model, nyears=3, n_iter=8, start_temperature=0.4,
#'                                   csv.output=FALSE)
#' # View the structure of the returned list:
#'     str(test_run,max.level=1)
#' # View the structure of the returned list element containing parameter objects:
#'     str(test_run$new_parameter_data,max.level=1)
#' # View the new preference matrix:
#'     test_run$new_parameter_data$new_preference_matrix
#' }
#'
#' # --------------------------------------------------------------------------
#'
#' # This is a dummy example to illustrate a realistic run in which optimised
#' # parameters are written back to the model Param folder. To try it out substitute 
#' # your own relative folder path in place of \Folder in the e2e_copy() function...
#' # WARNING - this will take about 26 hours to run...
#' # Copy the 1970-1999 version of the North Sea model supplied with the package into a
#' # user workspace relative to the current working directory (../Folder):
#' #    e2e_copy("North_Sea", "1970-1999",
#' #               dest.path="Folder")
#' # Load the copied version of the North Sea/1970-1999 model from the user workspace
#' # and assign a path for results data:
#' # (REPLACE "Folder/Models" and "Folder/results" with your own paths before running)
#' #    model<-e2e_read(model.name="North_Sea",
#' #                    model.variant="1970-1999",
#' #                    models.path="Folder/Models",
#' #                    results.path="Folder/results",
#' #                    model.ident="fittingrun")
#' # Launch the fitting process
#' #    fitting_data <- e2e_optimize_eco(model, nyears=50, n_iter=500, start_temperature=1,
#' #                                     csv.output=TRUE)
#'
#' # --------------------------------------------------------------------------
#'
#
# ---------------------------------------------------------------------
# |                                                                   |
# | Authors: Mike Heath, Ian Thurlbeck                                |
# | Department of Mathematics and Statistics                          |
# | University of Strathclyde, Glasgow                                |
# |                                                                   |
# | Date of this version: May 2020                                    |
# |                                                                   |
# ---------------------------------------------------------------------

e2e_optimize_eco <- function(model, nyears=40, n_iter=500, start_temperature=1, cooling=0.975, toppredlock=TRUE, quiet=TRUE, csv.output=FALSE, runtime.plot=TRUE) {

   oo <- options()
   on.exit(options(oo))

if(runtime.plot==TRUE){
start_par = par()$mfrow
dev.off()
on.exit(par(mfrow = start_par))
}

	pkg.env$csv.output <- csv.output	# controls writing of CSV files

	setup				<- elt(model, "setup")
	read.only			<- elt(setup, "read.only")
	model.path			<- elt(setup, "model.path")
	resultsdir			<- elt(setup, "resultsdir")
	identifier			<- elt(setup, "model.ident")

	data				<- elt(model, "data")
	fitted.parameters		<- elt(data, "fitted.parameters")
	HRscale_vector_multiplier	<- elt(data, "fleet.model", "HRscale_vector_multiplier")

	if (read.only & csv.output==TRUE) {
		message("Warning: cannot write fitted parameters back to the model input folders - model is read-only")
		message("Warning: to fix this, make a copy of the model using e2e_copy() into your own workspace.")
		stop("Model is not writable!")
	}

	print(date())

	if (length(which(HRscale_vector_multiplier<1 | HRscale_vector_multiplier>1)) > 0) {
		message("**************************************************************************")
		message("WARNING - one or more baseline Harvest Ratio scaling values differs from 1")
		message("**************************************************************************")
	}

	datastore <- c(
		fitted.parameters,
		annual_obj = 1e-60
	)

	perturbed       <- as.data.frame(datastore)
	parhistory	<- as.data.frame(datastore)		# these are all data frames
	proposalhistory	<- parhistory

	n_acceptances	<- 0					# Counter for the number of parameter acceptances which have occurred

	temperature<-start_temperature

	#ITERATE THE ANNEALING PROCESS.....

	for (kkk in 1:n_iter){

		annealing.parms <- read_SD_ecology(model.path)	# read every loop so control is available during a run

		if(kkk==1){
			axmin <- elt(annealing.parms, "axmin")
			axmax <- elt(annealing.parms, "axmax")
			axmin_p<-axmin
			axmax_p<-axmax
			newglims<-1
		}
		if(kkk>1){
			newglims<-0
			axmin <- elt(annealing.parms, "axmin")
				if((axmin==axmin_p)==FALSE){
					newglims<-1
				}
			axmax <- elt(annealing.parms, "axmax")
				if((axmax==axmax_p)==FALSE){
					newglims<-1
				}
			axmin_p<-axmin
			axmax_p<-axmax
		}
		
		if( axmax<=axmin ) {
		message("Warning: Plot axis max <= axis min in the optimize_ecology.csv parameter file - reset to defaults")
		axmax<-1
		axmin<-0
		newglims<-1
		}
		if( elt(annealing.parms, "Prefsd") < 0 ) {
		stop("Error: cv for preference parameter variations must be positive.")
		}
		if( elt(annealing.parms, "u_sd") < 0 ) {
		stop("Error: cv for maximum uptake rate parameter variations must be positive.")
		}
		if( elt(annealing.parms, "h_sd") < 0 ) {
		stop("Error: cv for uptake half saturation parameter variations must be positive.")
		}
		if( elt(annealing.parms, "biogeo_sd") < 0 ) {
		stop("Error: cv for biogeochemistry parameter variations must be positive.")
		}
		if( elt(annealing.parms, "mort_sd") < 0 ) {
		stop("Error: cv for mortality parameter variations must be positive.")
		}
		if( elt(annealing.parms, "ressd") < 0 ) {
		stop("Error: cv for other parameter variations must be positive.")
		}

		if (kkk>1) {

		# Jiggle the parameter values and load them into the vector which is passed to the model function
		perturbed <- perturb_parameters(datastore, annealing.parms, toppredlock)

		# perturbed parameters will be built into model.parameters for run:
		model$data$fitted.parameters <- perturbed[1:(length(perturbed)-1)]

		}

		# Build and run the model:
		build <- build_model(model, nyears)
		fit <- StrathE2E_fit(model, build, quiet)
		annual_obj <- elt(fit, "annual_obj")

		perturbed$annual_obj <- annual_obj

		temperature<-temperature*cooling

		if (kkk==1) lastbest <- annual_obj

		#Now the Metropolis algorithm.....

		lik_ratio<-exp(((log(annual_obj)) - log(lastbest))/temperature)
		rand<-runif(1,0,1)
		new_accepted<-" accepted: NO"
		if(lik_ratio>rand){
			n_acceptances<-n_acceptances+1
			datastore <- perturbed
			lastbest <- annual_obj
			new_accepted <- " accepted: YES"
		}

		#--------------------------------

		if(kkk==1) proposalhistory<-perturbed # first run through perturbed = the unaltered initial parameter set
		if(kkk>1) proposalhistory<-rbind(proposalhistory, perturbed)
		filename <- csvname(resultsdir, "annealing_par_proposalhistory", identifier)
		writecsv(proposalhistory, filename, row.names=FALSE)

		if(kkk==1) parhistory<-perturbed  # first run through perturbed = the unaltered initial parameter set
		if(kkk>1) parhistory<-rbind(parhistory, datastore)
		filename <- csvname(resultsdir, "annealing_par_acceptedhistory", identifier)
		writecsv(parhistory, filename, row.names=FALSE)

		message("Iteration: ",kkk,"; proposal likelihood: ",sprintf("%.7f",annual_obj),"; ",new_accepted)

#----------------------------------------------------------------------------------------------------------

		#Plot or update the time series of proposed and accepted likelihoods so far....
		if(runtime.plot==TRUE){
		if(kkk>1){

		if(kkk==2 | newglims==1){
		par(mfrow=c(1,1))
		plot(seq(1,nrow(proposalhistory)),proposalhistory$annual_obj,ylim=c(axmin,axmax),xlim=c(1,n_iter),xlab="Iterations",ylab="Target data likelihood",type="l",col="grey")
		points(seq(1,nrow(parhistory)),parhistory$annual_obj,type="l",col="black",lwd=3)
                legend("topleft",c("accepted","proposed"),bg="transparent",col=c("black","grey"),lty=c(1,1),lwd=c(3,1),pt.cex=c(1,1))
		} 

		if(newglims==0){ 
		stp<-c((kkk-1),kkk)
		points(stp,proposalhistory$annual_obj[stp],type="l",col="grey")
		points(stp,parhistory$annual_obj[stp],type="l",col="black",lwd=3)
		} 

		}
		}

#----------------------------------------------------------------------------------------------------------

	}



	#At the conclusion of the whole process, extract the final row of the parhistory into the format of model fitted parameter files and save to disc
	#The 'write_fitted_parameters' also returns a list object containing the new parameter file data is csv.output=TRUE

	new_parameters<-write_fitted_parameters(model, parhistory,csv.output)

	#Assemble a list object containing all the data generated by the annealing process
	output_data<-list(parameter_proposal_history = proposalhistory,
			  parameter_accepted_history = parhistory,
			  new_parameter_data = new_parameters)
	output_data

}