options(width = 80) # For summary listings
knitr::opts_chunk$set(
  comment = "", tidy = FALSE, cache = TRUE, fig.pos = "H", fig.align = "center"
)

\clearpage

Introduction

The purpose of this document is to demonstrate how nonlinear hierarchical models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS can be fitted with the mkin package.

It was assembled in the course of work package 1.1 of Project Number 173340 (Application of nonlinear hierarchical models to the kinetic evaluation of chemical degradation data) of the German Environment Agency carried out in 2022 and 2023.

The mkin package is used in version r packageVersion("mkin"). It contains the test data and the functions used in the evaluations. The saemix package is used as a backend for fitting the NLHM, but is also loaded to make the convergence plot function available.

This document is processed with the knitr package, which also provides the kable function that is used to improve the display of tabular data in R markdown documents. For parallel processing, the parallel package is used.

library(mkin)
library(knitr)
library(saemix)
library(parallel)
n_cores <- detectCores()
if (Sys.info()["sysname"] == "Windows") {
  cl <- makePSOCKcluster(n_cores)
} else {
  cl <- makeForkCluster(n_cores)
}

\clearpage

Data

The test data are available in the mkin package as an object of class mkindsg (mkin dataset group) under the identifier dimethenamid_2018. The following preprocessing steps are still necessary:

The following commented R code performs this preprocessing.

# Apply a function to each of the seven datasets in the mkindsg object to create a list
dmta_ds <- lapply(1:7, function(i) {
  ds_i <- dimethenamid_2018$ds[[i]]$data                     # Get a dataset
  ds_i[ds_i$name == "DMTAP", "name"] <-  "DMTA"              # Rename DMTAP to DMTA
  ds_i <- subset(ds_i, name == "DMTA", c("name", "time", "value")) # Select data
  ds_i$time <- ds_i$time * dimethenamid_2018$f_time_norm[i]  # Normalise time
  ds_i                                                       # Return the dataset
})

# Use dataset titles as names for the list elements
names(dmta_ds) <- sapply(dimethenamid_2018$ds, function(ds) ds$title)

# Combine data for Elliot soil to obtain a named list with six elements
dmta_ds[["Elliot"]] <- rbind(dmta_ds[["Elliot 1"]], dmta_ds[["Elliot 2"]]) #
dmta_ds[["Elliot 1"]] <- NULL
dmta_ds[["Elliot 2"]] <- NULL

\clearpage

The following tables show the r length(dmta_ds) datasets.

for (ds_name in names(dmta_ds)) {
    print(kable(mkin_long_to_wide(dmta_ds[[ds_name]]),
      caption = paste("Dataset", ds_name),
      label = paste0("tab:", ds_name), booktabs = TRUE))
    cat("\n\\clearpage\n")
}

Separate evaluations

In order to obtain suitable starting parameters for the NLHM fits, separate fits of the four models to the data for each soil are generated using the mmkin function from the mkin package. In a first step, constant variance is assumed. Convergence is checked with the status function.

deg_mods <- c("SFO", "FOMC", "DFOP", "HS")
f_sep_const <- mmkin(
  deg_mods,
  dmta_ds,
  error_model = "const",
  quiet = TRUE)

status(f_sep_const) |> kable()

In the table above, OK indicates convergence, and C indicates failure to converge. All separate fits with constant variance converged, with the sole exception of the HS fit to the BBA 2.2 data. To prepare for fitting NLHM using the two-component error model, the separate fits are updated assuming two-component error.

f_sep_tc <- update(f_sep_const, error_model = "tc")
status(f_sep_tc) |> kable()

Using the two-component error model, the one fit that did not converge with constant variance did converge, but other non-SFO fits failed to converge.

\clearpage

Hierarchichal model fits

The following code fits eight versions of hierarchical models to the data, using SFO, FOMC, DFOP and HS for the parent compound, and using either constant variance or two-component error for the error model. The default parameter distribution model in mkin allows for variation of all degradation parameters across the assumed population of soils. In other words, each degradation parameter is associated with a random effect as a first step. The mhmkin function makes it possible to fit all eight versions in parallel (given a sufficient number of computing cores being available) to save execution time.

Convergence plots and summaries for these fits are shown in the appendix.

f_saem <- mhmkin(list(f_sep_const, f_sep_tc), transformations = "saemix")

The output of the status function shows that all fits terminated successfully.

status(f_saem) |> kable()

The AIC and BIC values show that the biphasic models DFOP and HS give the best fits.

anova(f_saem) |> kable(digits = 1)

The DFOP model is preferred here, as it has a better mechanistic basis for batch experiments with constant incubation conditions. Also, it shows the lowest AIC and BIC values in the first set of fits when combined with the two-component error model. Therefore, the DFOP model was selected for further refinements of the fits with the aim to make the model fully identifiable.

Parameter identifiability based on the Fisher Information Matrix

Using the illparms function, ill-defined statistical model parameters such as standard deviations of the degradation parameters in the population and error model parameters can be found.

illparms(f_saem) |> kable()

According to the illparms function, the fitted standard deviation of the second kinetic rate constant k2 is ill-defined in both DFOP fits. This suggests that different values would be obtained for this standard deviation when using different starting values.

The thus identified overparameterisation is addressed by removing the random effect for k2 from the parameter model.

f_saem_dfop_tc_no_ranef_k2 <- update(f_saem[["DFOP", "tc"]],
  no_random_effect = "k2")

For the resulting fit, it is checked whether there are still ill-defined parameters,

illparms(f_saem_dfop_tc_no_ranef_k2)

which is not the case. Below, the refined model is compared with the previous best model. The model without random effect for k2 is a reduced version of the previous model. Therefore, the models are nested and can be compared using the likelihood ratio test. This is achieved with the argument test = TRUE to the anova function.

anova(f_saem[["DFOP", "tc"]], f_saem_dfop_tc_no_ranef_k2, test = TRUE) |>
  kable(format.args = list(digits = 4))

The AIC and BIC criteria are lower after removal of the ill-defined random effect for k2. The p value of the likelihood ratio test is much greater than 0.05, indicating that the model with the higher likelihood (here the model with random effects for all degradation parameters f_saem[["DFOP", "tc"]]) does not fit significantly better than the model with the lower likelihood (the reduced model f_saem_dfop_tc_no_ranef_k2).

Therefore, AIC, BIC and likelihood ratio test suggest the use of the reduced model.

The convergence of the fit is checked visually.

plot(f_saem_dfop_tc_no_ranef_k2$so, plot.type = "convergence")

All parameters appear to have converged to a satisfactory degree. The final fit is plotted using the plot method from the mkin package.

plot(f_saem_dfop_tc_no_ranef_k2)

Finally, a summary report of the fit is produced.

summary(f_saem_dfop_tc_no_ranef_k2)

\clearpage

Alternative check of parameter identifiability

The parameter check used in the illparms function is based on a quadratic approximation of the likelihood surface near its optimum, which is calculated using the Fisher Information Matrix (FIM). An alternative way to check parameter identifiability [@duchesne_2021] based on a multistart approach has recently been implemented in mkin.

The graph below shows boxplots of the parameters obtained in 50 runs of the saem algorithm with different parameter combinations, sampled from the range of the parameters obtained for the individual datasets fitted separately using nonlinear regression.

f_saem_dfop_tc_multi <- multistart(f_saem[["DFOP", "tc"]], n = 50, cores = 15)
par(mar = c(6.1, 4.1, 2.1, 2.1))
parplot(f_saem_dfop_tc_multi, lpos = "bottomright", ylim = c(0.3, 10), las = 2)

The graph clearly confirms the lack of identifiability of the variance of k2 in the full model. The overparameterisation of the model also indicates a lack of identifiability of the variance of parameter g.

The parameter boxplots of the multistart runs with the reduced model shown below indicate that all runs give similar results, regardless of the starting parameters.

f_saem_dfop_tc_no_ranef_k2_multi <- multistart(f_saem_dfop_tc_no_ranef_k2,
  n = 50, cores = 15)
par(mar = c(6.1, 4.1, 2.1, 2.1))
parplot(f_saem_dfop_tc_no_ranef_k2_multi, ylim = c(0.5, 2), las = 2,
  lpos = "bottomright")

When only the parameters of the top 25% of the fits are shown (based on a feature introduced in mkin 1.2.2 currently under development), the scatter is even less as shown below.

par(mar = c(6.1, 4.1, 2.1, 2.1))
parplot(f_saem_dfop_tc_no_ranef_k2_multi, ylim = c(0.5, 2), las = 2, llquant = 0.25,
  lpos = "bottomright")

\clearpage

Conclusions

Fitting the four parent degradation models SFO, FOMC, DFOP and HS as part of hierarchical model fits with two different error models and normal distributions of the transformed degradation parameters works without technical problems. The biphasic models DFOP and HS gave the best fit to the data, but the default parameter distribution model was not fully identifiable. Removing the random effect for the second kinetic rate constant of the DFOP model resulted in a reduced model that was fully identifiable and showed the lowest values for the model selection criteria AIC and BIC. The reliability of the identification of all model parameters was confirmed using multiple starting values.

Acknowledgements

The helpful comments by Janina Wöltjen of the German Environment Agency are gratefully acknowledged.

References

\vspace{1em}

::: {#refs} :::

\clearpage

Appendix

Hierarchical model fit listings

for (deg_mod in deg_mods) {
  for (err_mod in c("const", "tc")) {
    caption <- paste("Hierarchical mkin fit of the", deg_mod, "model with error model", err_mod)
    summary_listing(f_saem[[deg_mod, err_mod]], caption)
  }
}

Hierarchical model convergence plots

plot(f_saem[["SFO", "const"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["SFO", "tc"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["FOMC", "const"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["FOMC", "tc"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["DFOP", "const"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["DFOP", "tc"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["HS", "const"]]$so, plot.type = "convergence")

\clearpage

plot(f_saem[["HS", "tc"]]$so, plot.type = "convergence")

\clearpage

Session info

parallel::stopCluster(cl)
sessionInfo()

Hardware info

if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) {
  cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]]))
}
if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) {
  cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]]))
}


Try the mkin package in your browser

Any scripts or data that you put into this service are public.

mkin documentation built on Nov. 23, 2023, 3:02 p.m.