knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette introduces a post-hoc calibration approach mainly aimed at pro_flux()
models.
We will introduce the two steps required, (I) a parameter variation with alternate()
and (II) the evaluation of the results with evaluate_models()
.
library(ConFluxPro)
Different input parameters of the flux-gradient method (FGM) affect the modeled effluxes to a different extent.
Most importantly, the soil physics that we defined in soilphys
can have significant measurement and sampling uncertainties.
The derived DS
values often require many input parameters and therefore have a high uncertainty. At the same time,DS
has a large effect on the modeled efflux, leading to potential offsets and biases.
The following approach aims to quantify these effects and calibrate ConFluxPro models.
It is mostly intended to be used with inverse models from the pro_flux()
method.
The basic idea is to modify the values of the input parameters within a given range and re-run the models to quantify the effect on the efflux.
In a second step, different parameters of model quality can be calculated to select the model runs with the lowest model error.
Programatically, you will mainly interact with three functions: cfp_run_map()
to generate a plan on how to modify each model for each run,
alterante()
to actually run all models and evaluate_models()
to calculate a model error for each run and select the best results.
Let's load in the basic model we generated in the last article.
data("base_dat", package = "ConFluxPro") mod_pf <- pro_flux(base_dat)
From a given model, we can generate a run_map
, a plan on how to modify each model for each run with the function cfp_run_map()
.
We will now only consider the method = "random"
variant, where the parameter values in each run are randomly sampled.
This requires some input parameters.
First, we need to select the parameters we want to modify.
These parameters must come from the soilphys
dataset.
Keep the number of parameters low, because too many degrees of freedom may prevent the approach to converge on a specific parameter set.
Furthermore, different uncertainties can be balanced out by modifying a single parameter.
For example, a too high SWC can be compensated with a lower TPS or vice versa.
Here will select to only adapt the TPS
within each layer.
We also need to tell the function, in what range we want to modify the parameter and how to modify it.
Here, we have different options.
We can either give a specified range in absolute values, for type = "abs"
, subtract or add a certain amount from the original value with type = "addition"
or multiply it with a given factor in type = "factor"
.
For now, let's select type = "factor"
, and let the TPS modify for 90-110 % its original value.
We provide this information as a named list to the param
argument.
The names of that list represent the columns in soilphys
and the limits are given as a vector of length 2.
The type of modification is given as type
, a vector of the same length as params
.
The last thing we now have to provide is the number of model runs to generate by providing an integer number to n_runs
.
tps_run_map <- cfp_run_map(mod_pf, params = list("TPS" = c(0.9, 1.1)), type = "factor", method = "random", n_runs = 100)
With this we have a valid object of class cfp_run_map
.
This object is a data.frame
where each new run is identified by the column run_id
.
head(tps_run_map)
We can see that there is a new row for each run, and the value that TPS
will be multiplied with.
In this case, the TPS of the entire profile will be modified with the same factor for each run.
We can also select, that parameters within each layer should be modified independent from another.
tps_run_map <- cfp_run_map(mod_pf, params = list("TPS" = c(0.9, 1.1)), type = "factor", method = "random", n_runs = 100, layers_different = TRUE)
This generates a new TPS factor for each layer in layers_map
.
If we take a look in the run_map
again, we now see that there are multiple rows for each run_id
with a different value
for each layer.
head(tps_run_map)
We can extract the different parameters that are modified within the run_map
with cfp_params_df()
.
This shows the mapping between each parameter and its identifier (param_id
).
If layers_different = TRUE
, then each parameter of a new layer has its own param_id
.
cfp_params_df(tps_run_map)
We can also define these layers from a different source, either from soilphys
or by providing a custom data.frame
.
For example, in the example dataset, TPS changes in four layers within the soilphys.
Assuming these are measurements of TPS, it makes sense to vary the TPS within these boundaries as well.
To do so, we first set layers_from = "layers_altmap"
and provide the new cfp_layers_map()
, that defines the layers for the parameter variation.
library(dplyr) unique_tps <- cfp_soilphys(mod_pf) %>% group_by(site, TPS) %>% summarise(upper = max(upper), lower = min(lower)) %>% cfp_layers_map(id_cols = "site", gas = "CO2", lowlim = 0, highlim = 1000) tps_run_map <- cfp_run_map(mod_pf, params = list("TPS" = c(0.9, 1.1)), type = "factor", method = "random", n_runs = 100, layers_different = TRUE, layers_from = "layers_altmap", layers_altmap = unique_tps)
If we look into the result of cfp_params_df()
again, we can see that we now have one parameter for each layer and each site.
So in total we vary eight parameters.
cfp_params_df(tps_run_map)
By providing a layers_altmap
, we can also specify distinct limits for each layer individually.
Let's say the measurement of TPS in the lower soil had a lower variation that in the topsoil.
We can generate different limits within each layer:
unique_tps_with_limits <- unique_tps %>% mutate(TPS_delta = ifelse(lower <= -20, 0.02, 0.04)) %>% mutate(TPS_min = TPS - TPS_delta, TPS_max = TPS + TPS_delta)
All we have to do now is tell cfp_run_map
to use these columns in layers_altmap
as the new limits.
We change the numeric limits (c(0.9, 1.1
) into a character vector specifying the columns where these values are stored c("TPS_min", "TPS_max")
.
Depending on our setting in layers_from
, the function will search for these columns in the respective dataset.
Because the new values we provided are actual TPS values, and not factors, we also have to adjust the type
to "abs"
.
tps_run_map <- cfp_run_map(mod_pf, params = list("TPS" = c("TPS_min", "TPS_max")), type = "abs", method = "random", n_runs = 100, layers_different = TRUE, layers_from = "layers_altmap", layers_altmap = unique_tps_with_limits)
Whichever run_map
you generated, the next step is to apply it to modify a model.
This happens within the function alternate()
, which will be discussed now.
To apply a cfp_run_map()
to a model, we use the function alternate()
.
We provide both the original model, and the run_map
we created above.
The last thing we need is a function that re-calculates the soilphys
dataset.
Because within each run, the TPS value will be adjusted, we have to re-calculate the AFPS and the diffusion coefficient that was calculated from it.
In our case, we can use the complete_soilphys()
function.
Depending on how your parameters are named and how they relate to another, you may have to write your own function for that.
mod_pf_alt <- alternate( mod_pf, f = function(x) complete_soilphys(x, DSD0_formula = "a*AFPS^b", quiet = TRUE), run_map = tps_run_map )
alternate()
returns an object of class cfp_altres
.
This is simply a list of model results, with which you can interact as usual.
For example, you can access the efflux of the third run like so:
efflux(mod_pf_alt[[3]]) |> head()
We can also do so for all runs at the same time.
This binds the result together and adds the function run_id
.
EFFLUX_alt <- efflux(mod_pf_alt) EFFLUX_alt |> head()
Let's visualize this to see how much the efflux changed only by slightly varying the TPS of the model layers:
library(ggplot2) EFFLUX_alt %>% ggplot(aes(x = Date, y = efflux, group = run_id, col = "modified models"))+ geom_line(alpha = 0.2)+ geom_line(data = efflux(mod_pf), aes(group = NA, col = "original model"))+ facet_wrap(~site, ncol = 1)+ theme_light()+ theme(legend.position = "bottom")
The results stored in mod_pf_alt
are different model runs with different parameterizations.
Some may have improved the model, while some may have made it worse.
To select the best model runs, we have to quantify the model quality of each run.
ConFluxPro comes with two functions that estimate a specific error of the model, error_concentration()
and error_efflux()
.
The first, error_concentration()
quantifies the difference between the measured and modeled concentration profiles and is only applicable to pro_flux()
models. The second, error_efflux()
compares the estimated effluxes to reference measurements (e.g. chamber method).
We could combine both errors and select the model run with the smallest overall error.
This whole process is streamlined within evaluate_models()
.
Here, you can provide a list of error parameters to calculate.
The function applies each function to all model runs, scales and combines the returned error parameter and returns a list of the 'best' model runs.
Let's do so with the model runs mof_pf_alt
we calculated before.
Since we don't have actual reference measurements, let's pretend that the efflux estimate of the original, unaltered model were chamber measurements we conducted.
This is just to demonstrate the process.
EFFLUX_ref <- efflux(mod_pf)
With this, let's evaluate the models.
mod_pf_eval <- evaluate_models(mod_pf_alt, eval_funs = list( "error_concentration" = error_concentration, "error_efflux" = error_efflux), eval_weights = c(1, 1), param_cols = c("site"), eval_cols = c("site"), n_best = 5, scaling_fun = scale_min_median, EFFLUX = EFFLUX_ref)
We defined two types of errors in our call within the eval_funs
argument.
The first is the difference between the modeled and the measured concentration profile as calculated by error_concentration()
.
The second is the difference between modeled and measured efflux returned by error_efflux()
.
While we don't have a real measurement of efflux in this case, we provide the 'measured' efflux as described above as an additional argument in EFFLUX = EFFLUX_ref
.
These functions (and any we may write ourselves) must take the parameter param_cols
, indicating for which parameters a distinct error will be calculated.
We set this to be "site"
because we want to calibrate each of our sites individually.
The function calculates an error parameter for each combination of error_funs
and param_cols
.
You can define weights for each error parameter in eval_weights
.
Finally, each parameter is scaled between the runs.
Here we set scaling_fun = scale_min_median
, which means that the best run for this parameter has a value of 0 and the best 50\% of runs are <1.
For each value in eval_cols
, all scaled parameters are multiplied with their weighing factor and summed up to an overall estimate of the model quality.
Finally, we select the 5 best runs with the lowest model error by providing n_best = 5
.
Let's have a look at the result.
evalueate_models()
returns a list with different elements.
The runs with the lowest model error are stored in best_runs
.
mod_pf_eval$best_runs
We can see the run_id
of the 5 best model runs for each site and the corresponding overall model error ME
.
The model error of all runs is stored in model_error
...
mod_pf_eval$model_error |> head()
...and the exact values of each parameter in models_evaluated
.
mod_pf_eval$models_evaluated |> head()
If you have a lot of profiles (e.g. a long, hourly time series), it may be necessary to run the calibration only on a subset of profiles because of memory limitations.
In this case, you will want to run the entire model again, once a good parameterization was found.
To facilitate this, evaluate_models()
prepares a cfp_run_map()
with only the best runs which you can use with the complete model in alternate()
.
This is stored in best_runs_runmap
.
calibrated_run_map <- mod_pf_eval$best_runs_runmap
Let's use this run map to calculate the calibrated models.
mod_pf_cal_list <- alternate( mod_pf, f = function(x) complete_soilphys(x, DSD0_formula = "a*AFPS^b", quiet = TRUE), run_map = calibrated_run_map )
Finally, we can combine the returned list into a single model frame.
This adds a new column cmb_id
to the id_cols
that identifies each model run.
mod_pf_cal <- combine_models(mod_pf_cal_list) mod_pf_cal
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.