options(width = 80) # For summary listings knitr::opts_chunk$set( comment = "", tidy = FALSE, cache = TRUE, fig.pos = "H", fig.align = "center" )
\clearpage
The purpose of this document is to test demonstrate how nonlinear hierarchical models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS, with serial formation of two or more metabolites can be fitted with the mkin package.
It was assembled in the course of work package 1.2 of Project Number 173340 (Application of nonlinear hierarchical models to the kinetic evaluation of chemical degradation data) of the German Environment Agency carried out in 2022 and 2023.
The mkin package is used in version r packageVersion("mkin")
which is
currently under development. The newly introduced functionality that is
used here is a simplification of excluding random effects for a set of fits
based on a related set of fits with a reduced model, and the documentation of
the starting parameters of the fit, so that all starting parameters of saem
fits are now listed in the summary. The saemix
package is used as a backend
for fitting the NLHM, but is also loaded to make the convergence plot function
available.
This document is processed with the knitr
package, which also provides the
kable
function that is used to improve the display of tabular data in R
markdown documents. For parallel processing, the parallel
package is used.
library(mkin) library(knitr) library(saemix) library(parallel) n_cores <- detectCores() # We need to start a new cluster after defining a compiled model that is # saved as a DLL to the user directory, therefore we define a function # This is used again after defining the pathway model start_cluster <- function(n_cores) { if (Sys.info()["sysname"] == "Windows") { ret <- makePSOCKcluster(n_cores) } else { ret <- makeForkCluster(n_cores) } return(ret) } cl <- start_cluster(n_cores)
\clearpage
The example data are taken from the final addendum to the DAR from 2014
and are distributed with the mkin package. Residue data and time step
normalisation factors are read in using the function read_spreadsheet
from
the mkin package. This function also performs the time step normalisation.
data_file <- system.file( "testdata", "cyantraniliprole_soil_efsa_2014.xlsx", package = "mkin") cyan_ds <- read_spreadsheet(data_file, parent_only = FALSE)
The following tables show the covariate data and the r length(cyan_ds)
datasets that were read in from the spreadsheet file.
pH <- attr(cyan_ds, "covariates") kable(pH, caption = "Covariate data")
\clearpage
for (ds_name in names(cyan_ds)) { print( kable(mkin_long_to_wide(cyan_ds[[ds_name]]), caption = paste("Dataset", ds_name), booktabs = TRUE, row.names = FALSE)) cat("\n\\clearpage\n") }
\clearpage
As the pathway fits have very long run times, evaluations of the parent data are performed first, in order to determine for each hierarchical parent degradation model which random effects on the degradation model parameters are ill-defined.
cyan_sep_const <- mmkin(c("SFO", "FOMC", "DFOP", "SFORB", "HS"), cyan_ds, quiet = TRUE, cores = n_cores) cyan_sep_tc <- update(cyan_sep_const, error_model = "tc") cyan_saem_full <- mhmkin(list(cyan_sep_const, cyan_sep_tc)) status(cyan_saem_full) |> kable()
All fits converged successfully.
illparms(cyan_saem_full) |> kable()
In almost all models, the random effect for the initial concentration of the parent compound is ill-defined. For the biexponential models DFOP and SFORB, the random effect of one additional parameter is ill-defined when the two-component error model is used.
anova(cyan_saem_full) |> kable(digits = 1)
Model comparison based on AIC and BIC indicates that the two-component error model is preferable for all parent models with the exception of DFOP. The lowest AIC and BIC values are are obtained with the FOMC model, followed by SFORB and DFOP.
cyan_saem_reduced <- mhmkin(list(cyan_sep_const, cyan_sep_tc), no_random_effect = illparms(cyan_saem_full)) illparms(cyan_saem_reduced) anova(cyan_saem_reduced) |> kable(digits = 1)
stopCluster(cl)
To test the technical feasibility of coupling the relevant parent degradation
models with different transformation pathway models, a list of mkinmod
models
is set up below. As in the EU evaluation, parallel formation of metabolites
JCZ38 and J9Z38 and secondary formation of metabolite JSE76 from JCZ38 is used.
if (!dir.exists("cyan_dlls")) dir.create("cyan_dlls") cyan_path_1 <- list( sfo_path_1 = mkinmod( cyan = mkinsub("SFO", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "sfo_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), fomc_path_1 = mkinmod( cyan = mkinsub("FOMC", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "fomc_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), dfop_path_1 = mkinmod( cyan = mkinsub("DFOP", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "dfop_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), sforb_path_1 = mkinmod( cyan = mkinsub("SFORB", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "sforb_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), hs_path_1 = mkinmod( cyan = mkinsub("HS", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "hs_path_1", dll_dir = "cyan_dlls", overwrite = TRUE) ) cl_path_1 <- start_cluster(n_cores)
To obtain suitable starting values for the NLHM fits, separate pathway fits are performed for all datasets.
f_sep_1_const <- mmkin( cyan_path_1, cyan_ds, error_model = "const", cluster = cl_path_1, quiet = TRUE) status(f_sep_1_const) |> kable() f_sep_1_tc <- update(f_sep_1_const, error_model = "tc") status(f_sep_1_tc) |> kable()
Most separate fits converged successfully. The biggest convergence problems are seen when using the HS model with constant variance.
For the hierarchical pathway fits, those random effects that could not be quantified in the corresponding parent data analyses are excluded.
In the code below, the output of the illparms
function for the parent only
fits is used as an argument no_random_effect
to the mhmkin
function.
The possibility to do so was introduced in mkin version 1.2.2
which is
currently under development.
f_saem_1 <- mhmkin(list(f_sep_1_const, f_sep_1_tc), no_random_effect = illparms(cyan_saem_full), cluster = cl_path_1)
status(f_saem_1) |> kable()
The status information from the individual fits shows that all fits completed
successfully. The matrix entries Fth and FO indicate that the Fisher
Information Matrix could not be inverted for the fixed effects (theta)
and the random effects (Omega), respectively. For the affected fits,
ill-defined parameters cannot be determined using the illparms
function,
because it relies on the Fisher Information Matrix.
illparms(f_saem_1) |> kable()
The model comparison below suggests that the pathway fits using DFOP or SFORB for the parent compound provide the best fit.
anova(f_saem_1) |> kable(digits = 1)
For these two parent model, successful fits are shown below. Plots of the fits with the other parent models are shown in the Appendix.
plot(f_saem_1[["dfop_path_1", "tc"]])
\clearpage
plot(f_saem_1[["sforb_path_1", "tc"]])
A closer graphical analysis of these Figures shows that the residues of transformation product JCZ38 in the soils Tama and Nambsheim observed at later time points are strongly and systematically underestimated.
\clearpage
stopCluster(cl_path_1)
To improve the fit for JCZ38, a back-reaction from JSE76 to JCZ38 was introduced in an alternative version of the transformation pathway, in analogy to the back-reaction from K5A78 to K5A77. Both pairs of transformation products are pairs of an organic acid with its corresponding amide (Addendum 2014, p. 109). As FOMC provided the best fit for the parent, and the biexponential models DFOP and SFORB provided the best initial pathway fits, these three parent models are used in the alternative pathway fits.
cyan_path_2 <- list( fomc_path_2 = mkinmod( cyan = mkinsub("FOMC", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "fomc_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ), dfop_path_2 = mkinmod( cyan = mkinsub("DFOP", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "dfop_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ), sforb_path_2 = mkinmod( cyan = mkinsub("SFORB", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "sforb_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ) ) cl_path_2 <- start_cluster(n_cores) f_sep_2_const <- mmkin( cyan_path_2, cyan_ds, error_model = "const", cluster = cl_path_2, quiet = TRUE) status(f_sep_2_const) |> kable()
Using constant variance, separate fits converge with the exception of the fits to the Sassafras soil data.
f_sep_2_tc <- update(f_sep_2_const, error_model = "tc") status(f_sep_2_tc) |> kable()
Using the two-component error model, all separate fits converge with the exception of the alternative pathway fit with DFOP used for the parent and the Sassafras dataset.
f_saem_2 <- mhmkin(list(f_sep_2_const, f_sep_2_tc), no_random_effect = illparms(cyan_saem_full[2:4, ]), cluster = cl_path_2)
status(f_saem_2) |> kable()
The hierarchical fits for the alternative pathway completed successfully.
illparms(f_saem_2) |> kable()
In both fits, the random effects for the formation fractions for the pathways from JCZ38 to JSE76, and for the reverse pathway from JSE76 to JCZ38 are ill-defined.
anova(f_saem_2) |> kable(digits = 1)
The variants using the biexponential models DFOP and SFORB for the parent compound and the two-component error model give the lowest AIC and BIC values and are plotted below. Compared with the original pathway, the AIC and BIC values indicate a large improvement. This is confirmed by the plots, which show that the metabolite JCZ38 is fitted much better with this model.
\clearpage
plot(f_saem_2[["fomc_path_2", "tc"]])
\clearpage
plot(f_saem_2[["dfop_path_2", "tc"]])
\clearpage
plot(f_saem_2[["sforb_path_2", "tc"]])
\clearpage
All ill-defined random effects that were identified in the parent only fits and
in the above pathway fits, are excluded for the final evaluations below.
For this purpose, a list of character vectors is created below that can be indexed
by row and column indices, and which contains the degradation parameter names for which
random effects should be excluded for each of the hierarchical fits contained
in f_saem_2
.
no_ranef <- matrix(list(), nrow = 3, ncol = 2, dimnames = dimnames(f_saem_2)) no_ranef[["fomc_path_2", "const"]] <- c("log_beta", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["fomc_path_2", "tc"]] <- c("cyan_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["dfop_path_2", "const"]] <- c("cyan_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["dfop_path_2", "tc"]] <- c("cyan_0", "log_k1", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["sforb_path_2", "const"]] <- c("cyan_free_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["sforb_path_2", "tc"]] <- c("cyan_free_0", "log_k_cyan_free_bound", "f_JCZ38_qlogis", "f_JSE76_qlogis") clusterExport(cl_path_2, "no_ranef") f_saem_3 <- update(f_saem_2, no_random_effect = no_ranef, cluster = cl_path_2)
status(f_saem_3) |> kable()
With the exception of the FOMC pathway fit with constant variance, all updated fits completed successfully. However, the Fisher Information Matrix for the fixed effects (Fth) could not be inverted, so no confidence intervals for the optimised parameters are available.
illparms(f_saem_3) |> kable()
anova(f_saem_3) |> kable(digits = 1)
While the AIC and BIC values of the best fit (DFOP pathway fit with two-component error) are lower than in the previous fits with the alternative pathway, the practical value of these refined evaluations is limited as no confidence intervals are obtained.
stopCluster(cl_path_2)
\clearpage
It was demonstrated that a relatively complex transformation pathway with parallel formation of two primary metabolites and one secondary metabolite can be fitted even if the data in the individual datasets are quite different and partly only cover the formation phase.
The run times of the pathway fits were several hours, limiting the practical feasibility of iterative refinements based on ill-defined parameters and of alternative checks of parameter identifiability based on multistart runs.
The helpful comments by Janina Wöltjen of the German Environment Agency are gratefully acknowledged.
\clearpage
plot(f_saem_1[["sfo_path_1", "tc"]])
\clearpage
plot(f_saem_1[["fomc_path_1", "tc"]])
\clearpage
plot(f_saem_1[["sforb_path_1", "tc"]])
\clearpage
errmods <- c(const = "constant variance", tc = "two-component error") degmods <- c( sfo_path_1 = "SFO path 1", fomc_path_1 = "FOMC path 1", dfop_path_1 = "DFOP path 1", sforb_path_1 = "SFORB path 1", hs_path_1 = "HS path 1") for (deg_mod in rownames(f_saem_1)) { for (err_mod in c("const", "tc")) { fit <- f_saem_1[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with", errmods[err_mod]) summary_listing(fit, caption) } } }
degmods <- c( fomc_path_2 = "FOMC path 2", dfop_path_2 = "DFOP path 2", sforb_path_2 = "SFORB path 2") for (deg_mod in rownames(f_saem_2)) { for (err_mod in c("const", "tc")) { fit <- f_saem_2[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with", errmods[err_mod]) summary_listing(fit, caption) } } }
degmods <- c( fomc_path_2 = "FOMC path 2", dfop_path_2 = "DFOP path 2", sforb_path_2 = "SFORB path 2") for (deg_mod in rownames(f_saem_3)) { for (err_mod in c("const", "tc")) { fit <- f_saem_3[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with reduced random effects,", errmods[err_mod]) summary_listing(fit, caption) } } }
sessionInfo()
if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) { cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]])) } if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) { cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]])) }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.