options(width = 80) # For summary listings knitr::opts_chunk$set( comment = "", tidy = FALSE, cache = TRUE, fig.pos = "H", fig.align = "center" )
\clearpage
The purpose of this document is to demonstrate how nonlinear hierarchical models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS can be fitted with the mkin package.
It was assembled in the course of work package 1.1 of Project Number 173340 (Application of nonlinear hierarchical models to the kinetic evaluation of chemical degradation data) of the German Environment Agency carried out in 2022 and 2023.
The mkin package is used in version r packageVersion("mkin")
. It contains the
test data and the functions used in the evaluations. The saemix
package is
used as a backend for fitting the NLHM, but is also loaded to make the
convergence plot function available.
This document is processed with the knitr
package, which also provides the
kable
function that is used to improve the display of tabular data in R
markdown documents. For parallel processing, the parallel
package is used.
library(mkin) library(knitr) library(saemix) library(parallel) n_cores <- detectCores() if (Sys.info()["sysname"] == "Windows") { cl <- makePSOCKcluster(n_cores) } else { cl <- makeForkCluster(n_cores) }
\clearpage
The test data are available in the mkin package as an object of class
mkindsg
(mkin dataset group) under the identifier dimethenamid_2018
. The
following preprocessing steps are still necessary:
Elliot 1
and Elliot 2
) are
combined, resulting in dimethenamid (DMTA) data from six soils.The following commented R code performs this preprocessing.
# Apply a function to each of the seven datasets in the mkindsg object to create a list dmta_ds <- lapply(1:7, function(i) { ds_i <- dimethenamid_2018$ds[[i]]$data # Get a dataset ds_i[ds_i$name == "DMTAP", "name"] <- "DMTA" # Rename DMTAP to DMTA ds_i <- subset(ds_i, name == "DMTA", c("name", "time", "value")) # Select data ds_i$time <- ds_i$time * dimethenamid_2018$f_time_norm[i] # Normalise time ds_i # Return the dataset }) # Use dataset titles as names for the list elements names(dmta_ds) <- sapply(dimethenamid_2018$ds, function(ds) ds$title) # Combine data for Elliot soil to obtain a named list with six elements dmta_ds[["Elliot"]] <- rbind(dmta_ds[["Elliot 1"]], dmta_ds[["Elliot 2"]]) # dmta_ds[["Elliot 1"]] <- NULL dmta_ds[["Elliot 2"]] <- NULL
\clearpage
The following tables show the r length(dmta_ds)
datasets.
for (ds_name in names(dmta_ds)) { print(kable(mkin_long_to_wide(dmta_ds[[ds_name]]), caption = paste("Dataset", ds_name), label = paste0("tab:", ds_name), booktabs = TRUE)) cat("\n\\clearpage\n") }
In order to obtain suitable starting parameters for the NLHM fits, separate
fits of the four models to the data for each soil are generated using the mmkin
function from the mkin
package. In a first step, constant variance is assumed.
Convergence is checked with the status
function.
deg_mods <- c("SFO", "FOMC", "DFOP", "HS") f_sep_const <- mmkin( deg_mods, dmta_ds, error_model = "const", quiet = TRUE) status(f_sep_const) |> kable()
In the table above, OK indicates convergence, and C indicates failure to converge. All separate fits with constant variance converged, with the sole exception of the HS fit to the BBA 2.2 data. To prepare for fitting NLHM using the two-component error model, the separate fits are updated assuming two-component error.
f_sep_tc <- update(f_sep_const, error_model = "tc") status(f_sep_tc) |> kable()
Using the two-component error model, the one fit that did not converge with constant variance did converge, but other non-SFO fits failed to converge.
\clearpage
The following code fits eight versions of hierarchical models to the data,
using SFO, FOMC, DFOP and HS for the parent compound, and using either constant
variance or two-component error for the error model. The default parameter
distribution model in mkin allows for variation of all degradation parameters
across the assumed population of soils. In other words, each degradation
parameter is associated with a random effect as a first step. The mhmkin
function makes it possible to fit all eight versions in parallel (given a
sufficient number of computing cores being available) to save execution time.
Convergence plots and summaries for these fits are shown in the appendix.
f_saem <- mhmkin(list(f_sep_const, f_sep_tc), transformations = "saemix")
The output of the status
function shows that all fits terminated
successfully.
status(f_saem) |> kable()
The AIC and BIC values show that the biphasic models DFOP and HS give the best fits.
anova(f_saem) |> kable(digits = 1)
The DFOP model is preferred here, as it has a better mechanistic basis for batch experiments with constant incubation conditions. Also, it shows the lowest AIC and BIC values in the first set of fits when combined with the two-component error model. Therefore, the DFOP model was selected for further refinements of the fits with the aim to make the model fully identifiable.
Using the illparms
function, ill-defined statistical model parameters such
as standard deviations of the degradation parameters in the population and
error model parameters can be found.
illparms(f_saem) |> kable()
According to the illparms
function, the fitted standard deviation of the
second kinetic rate constant k2
is ill-defined in both DFOP fits. This
suggests that different values would be obtained for this standard deviation
when using different starting values.
The thus identified overparameterisation is addressed by removing the random
effect for k2
from the parameter model.
f_saem_dfop_tc_no_ranef_k2 <- update(f_saem[["DFOP", "tc"]], no_random_effect = "k2")
For the resulting fit, it is checked whether there are still ill-defined parameters,
illparms(f_saem_dfop_tc_no_ranef_k2)
which is not the case. Below, the refined model is compared with the previous
best model. The model without random effect for k2
is a reduced version of
the previous model. Therefore, the models are nested and can be compared using
the likelihood ratio test. This is achieved with the argument test = TRUE
to the anova
function.
anova(f_saem[["DFOP", "tc"]], f_saem_dfop_tc_no_ranef_k2, test = TRUE) |> kable(format.args = list(digits = 4))
The AIC and BIC criteria are lower after removal of the ill-defined random
effect for k2
. The p value of the likelihood ratio test is much greater
than 0.05, indicating that the model with the higher likelihood (here
the model with random effects for all degradation parameters
f_saem[["DFOP", "tc"]]
) does not fit significantly better than the model
with the lower likelihood (the reduced model f_saem_dfop_tc_no_ranef_k2
).
Therefore, AIC, BIC and likelihood ratio test suggest the use of the reduced model.
The convergence of the fit is checked visually.
plot(f_saem_dfop_tc_no_ranef_k2$so, plot.type = "convergence")
All parameters appear to have converged to a satisfactory degree. The final fit is plotted using the plot method from the mkin package.
plot(f_saem_dfop_tc_no_ranef_k2)
Finally, a summary report of the fit is produced.
summary(f_saem_dfop_tc_no_ranef_k2)
\clearpage
The parameter check used in the illparms
function is based on a quadratic
approximation of the likelihood surface near its optimum, which is calculated
using the Fisher Information Matrix (FIM). An alternative way to check
parameter identifiability [@duchesne_2021] based on a multistart approach
has recently been implemented in mkin.
The graph below shows boxplots of the parameters obtained in 50 runs of the saem algorithm with different parameter combinations, sampled from the range of the parameters obtained for the individual datasets fitted separately using nonlinear regression.
f_saem_dfop_tc_multi <- multistart(f_saem[["DFOP", "tc"]], n = 50, cores = 15)
par(mar = c(6.1, 4.1, 2.1, 2.1)) parplot(f_saem_dfop_tc_multi, lpos = "bottomright", ylim = c(0.3, 10), las = 2)
The graph clearly confirms the lack of identifiability of the variance of k2
in
the full model. The overparameterisation of the model also indicates a lack of
identifiability of the variance of parameter g
.
The parameter boxplots of the multistart runs with the reduced model shown below indicate that all runs give similar results, regardless of the starting parameters.
f_saem_dfop_tc_no_ranef_k2_multi <- multistart(f_saem_dfop_tc_no_ranef_k2, n = 50, cores = 15)
par(mar = c(6.1, 4.1, 2.1, 2.1)) parplot(f_saem_dfop_tc_no_ranef_k2_multi, ylim = c(0.5, 2), las = 2, lpos = "bottomright")
When only the parameters of the top 25% of the fits are shown (based on a feature introduced in mkin 1.2.2 currently under development), the scatter is even less as shown below.
par(mar = c(6.1, 4.1, 2.1, 2.1)) parplot(f_saem_dfop_tc_no_ranef_k2_multi, ylim = c(0.5, 2), las = 2, llquant = 0.25, lpos = "bottomright")
\clearpage
Fitting the four parent degradation models SFO, FOMC, DFOP and HS as part of hierarchical model fits with two different error models and normal distributions of the transformed degradation parameters works without technical problems. The biphasic models DFOP and HS gave the best fit to the data, but the default parameter distribution model was not fully identifiable. Removing the random effect for the second kinetic rate constant of the DFOP model resulted in a reduced model that was fully identifiable and showed the lowest values for the model selection criteria AIC and BIC. The reliability of the identification of all model parameters was confirmed using multiple starting values.
The helpful comments by Janina Wöltjen of the German Environment Agency are gratefully acknowledged.
\vspace{1em}
::: {#refs} :::
\clearpage
for (deg_mod in deg_mods) { for (err_mod in c("const", "tc")) { caption <- paste("Hierarchical mkin fit of the", deg_mod, "model with error model", err_mod) summary_listing(f_saem[[deg_mod, err_mod]], caption) } }
plot(f_saem[["SFO", "const"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["SFO", "tc"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["FOMC", "const"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["FOMC", "tc"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["DFOP", "const"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["DFOP", "tc"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["HS", "const"]]$so, plot.type = "convergence")
\clearpage
plot(f_saem[["HS", "tc"]]$so, plot.type = "convergence")
\clearpage
parallel::stopCluster(cl) sessionInfo()
if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) { cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]])) } if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) { cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]])) }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.