library(rmarkdown)
options(continue=" ")
options(width=60)
library(knitr)
library(glmtools)
library(GLM3r)
knitr::opts_knit$set(root.dir = tempdir())

Introduction

This vignette will give you a demonstration of the newly added automatic calibration method in glmtools. Currently there are two optimization algorithms included: the evolutionary Covariance Matrix Adaption - Evolution Strategy (CMA-ES) and the classical Nelder-Mead algorithm. At the moment, you also only do a minimization of the RMSE to optimize the parameter space of given values. To avoid changing files in the package folder, we will work here in a temporary folder that will automatically deleted after you have closed R.

Using example data built into the package

Find example data directory

library(glmtools)
GLM3r_folder = system.file('extdata', package = 'GLM3r')
glmtools_folder = system.file('extdata', package = 'glmtools')

List files that are included in the example data

dir(GLM3r_folder)
dir(glmtools_folder)

Move glmtools example files to a temporary folder

example.files <- dir(glmtools_folder)
temp_folder   <- tempdir()

file.copy(list.files(glmtools_folder,full.names = T), temp_folder, overwrite = TRUE)

Note that new files have been added to the temporary directory:

dir(temp_folder)

The uncalibrated setup

We can use glmtools functions to visualize and quantify the goodness of the fit of the uncalibrated setup, for instance by using the plotting and comparing functions when field data is available. But first, we need to declare the names of our output, configuration and field data variables.

out_file <- file.path(temp_folder, 'output/output.nc')
field_data <- file.path(temp_folder, 'LakeMendota_field_data_hours.csv')
file.copy('glm3_uncalibrated.nml', 'glm3.nml', overwrite = TRUE)
nml_data <- file.path(temp_folder, 'glm3.nml')

Running uncalibrated GLM setup

The contents of glmtools_folder (i.e., a glm3.nml file and driver data including .csv file for the meteorological drivers) are now located in a temporary folder (temp_folder). The run_glm() function will now return the results from the uncalibrated setup.

GLM3r::run_glm(sim_folder = temp_folder, verbose = F)

Visualizing the uncalibrated setup

Then we can plot the uncalibrated setup and quantify the root-mean squared error between observed and simulated data.

plot_var_nc(out_file, var_name = 'temp')
plot_var_compare(nc_file = out_file, field_file = field_data,var_name = 'temp', precision = 'hours')
temp_rmse <- compare_to_field(out_file, field_file = field_data,
                              metric = 'water.temperature', as_value = FALSE, precision= 'hours')

print(paste('Total time period (uncalibrated):',round(temp_rmse,2),'deg C RMSE'))

Calibrating GLM

Before we run the automatic calibration routine, we need to define certain input variables for the function. Here, we are trying to minimize the RMSE between observed and simulated water temperature data using the CMA-ES algorithm. To ensure that there is a reasonable covariance between variables, we are scaling all values on a space between 0 and 10. Further, to reduce computational times, we are stopping the algorithm either after reaching a RMSE of 1.0 degree Celsius, or after doing 100 iterations. Also, because we are working in a temporary directory, we are not saving any values. This approach will calibrate the wind_factor, the lw_factor, Kw and the sed_temp_mean values of two sediment zones with given lower and upper boundaries, as well as initial values.

We will also print out here the RMSE of every iteration, as well as diagnostic plots of the calibration, and contour maps of all time periods.

var = 'temp'         # variable to which we apply the calibration procedure
path = getwd()       # simulation path/folder
# obs = read_field_obs(field_data)  # observed field data
nml_file = nml_data  # path of the nml configuration file
glm_file = nml_data  # path of the glm3.nml file
calib_setup = get_calib_setup() # create a setup of variables for calibration
glmcmd = NULL        # command to be used, default applies the GLM3r function
# Optional variables
first.attempt = TRUE # if TRUE, deletes all local csv-files that stores the 
#outcome of previous calibration runs
period = get_calib_periods(nml_file, ratio = 1) # define a period for the calibration, 
# this supports a split-sample calibration (e.g. calibration and validation period)
scaling = TRUE       # scaling of the variables in a space of [0,10]; TRUE for CMA-ES
method = 'CMA-ES'    # optimization method, choose either `CMA-ES` or `Nelder-Mead`
metric = 'RMSE'      # objective function to be minimized, here the root-mean square error
target.fit = 1.0     # refers to a target fit of 1.0 degrees Celsius
target.iter = 100    # refers to a maximum run of 100 calibration iterations
plotting = FALSE      # if TRUE, script will automatically save the contour plots
output = out_file    # path of the output file
field_file = field_data # path of the field data
conversion.factor = 1 # conversion factor for water quality (AED2 assumes mmol/m3)

# main calibration function
calibrate_sim(var, path, field_file, nml_file, glm_file, calib_setup, glmcmd, first.attempt, period, scaling, method, metric, target.fit, target.iter, plotting, output, conversion.factor, verbose = FALSE)

You can see that the RMSE fit of the total period was improved compared to the uncalibrated setup. You can also notice that the calibration fit is in most cases better than the validation fit, which makes sense and helps us to avoid-overfitting. The table at the end gives you the final calibrated values of all variables



USGS-R/glmtools documentation built on March 26, 2024, 5:43 p.m.