knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(chemdeg)
A key aspect in food science is the prediction of the change of food quality in time, and thus a quantitative analysis of the shelf life of foods and of their components is of paramount importance. chemdeg
is a package developed to assist food chemists in the objective selection of degradation kinetic models of foods and parameters estimation. The software implements a two-step method to this purpose. First, experimental data are analyzed in the so-called phase space which allows for the estimation of the order of the reaction; then the data are fitted with the appropriate standard kinetic model to estimate the reaction rate. The whole procedure is driven by rigorous statistical analyses of the results. The package also provides a routine to fit a non-standard kinetic degradation model named first-order multi-target model (FOMT).
The standard degradation kinetics of food and of their components are described within the following differential equation which formalizes the mass-action law: $$\frac{d y}{d t}=-k\,y^n \tag{1} $$ where $y$ is the concentration of the reactant (i.e. the concentration of the degrading food molecule), $t$ is the time, $k$ is the rate constant and $n$ is the order of reaction. The order $n$ can assume either integer (e.g. 0,1,2...) or fractional values as it may observed for complex degradation reactions like overall multistage reaction composed by single reactions with different reaction orders.
A logarithmic transformation of the eq.(1) yields: $$ \log{|\frac{d y}{d t}}|=\log{k}+n\,\log{y}$$ where $\log{}$ is the natural logarithm.
A plot of $\log{|\frac{dy}{dt}|}$ versus $\log{y}$ defines the phase space of the dynamical system, and standard n^th^ reaction order kinetic models are here represented as straight lines with slope equal to $n$ and intercept equal to the logarithm of $k$. Derivatives are calculated with the central finite difference method.
The slope can be easily determined by linear regression (weighted on experimental error if available) of log-transformed data. Both known and unknown uncertainties can affect this estimation. Thus a first decision about the appropriate order of the reaction is taken by the program on the basis of the following four criteria:
The general solution of eq.(1) is: $$ y(t)=\begin{cases} ((n-1)\,k\,t+y_0^{1-n}))^{\frac{1}{n-1}} & \text{if $n\ne 0$} \ y_0\,e^{-k\,t} & \text{if $n=0$} \end{cases} $$ Once the reaction order has been determined, non-linear least squares regression can be performed using the above solution with the rate constant $k$ as free parameter.
The function det_order()
performs all in one the analysis described in the [Introduction] section. The input of the function must be an at least two-columns data-frame with time data in the first column and concentration in the second column. The values of experimental error can be given in the optional third column. Different input types or data-frames with less than two or more than 3 columns will return an error. The output of the function is an ord_res
class object.
ord1 # simulated data from a first order kinetic model with error res <- det_order(ord1) class(res)
The function results()
shows a comprehensive summary of the results. These include:
results(res)
The function plot_ord()
allows the graphical visualization of the results obtained with the whole analysis. Function automatically plots the experimental error when provided. In these plots, black lines show the best regression curve, whereas green lines show the fits with the reaction order chosen according to the criteria given in the previous section.
plot_ord(res)
The functions phase_space()
and kin_regr()
allow to retrieve the linear regression from the phase space and the regression with the selected model, respectively:
linear_model_phase_space <- phase_space(res) linear_model_phase_space kinetic_regression <- kin_regr(res) kinetic_regression
Goodness-of-fit statistics include functions that are already accessible in the package stats
, like: Bayesian Information Criterion (stats::BIC()
), Akaike's Information Criterion (stats::AIC()
) and Root Mean Square Error (stats::sigma()
).
In the package chemdeg
two more measures are introduced, which are the reduced chi-squared ($\chi^2_{red}=\chi^2/df$, where $df$ are the degrees of freedom) and the Akaike's Information Criterion AICc corrected for finite samples. The values of both statistics are accessible with the functions chiquad_red()
and AICC()
, respectively.
chiquad_red(kinetic_regression) AICC(kinetic_regression)
The whole statistics can be accessed through the goodness_of_fit()
function.
goodness_of_fit(kinetic_regression)
To generate a formula
object containing the formula of an n^th^ kinetic reaction model, the function f_gen(n)
can be called, by giving n as input.
f_gen(1) f_gen(2)
The FOMT model is equivalent to the Single-Hit Multi-Target model (SHMT), a model developed in the field of radiation biology to describe the fraction of cell surviving radiation treatments. In radio-biology the problem is to calculate the probability of cell survival to a dose D of radiation whereas in the present context the probability that molecules survive within a given time interval. In both cases, the assumption is that the events that hit a cell or, as in this case, a food sample are random and they occur at a constant mean rate $k$ and independently of the time since the last event (the hits, therefore, follow the Poisson distribution). If a sample is composed of sub-units and at least m of them must be inactivated by at least one hit, the overall survival probability is given by the equation:
$$\frac{Y(t)}{Y_0}=1-(1-e^{-k\,t})^n $$ More information on model development can be found in this work.
In chemdeg
the FOMT model can be fitted by calling FOMT()
function, giving as input a data-frame with time, concentration and error (optional) in the first three columns.
The output is an nls
class object and thus all the functions from package stats
can be used to retrieve regression information. Even in this case the goodness-of-fit can be accessed through the goodness_of_fit()
function.
The non-linear fit with the FOMT model is quite sensitive to the initial value of the parameters and this is particularly true for the exponent m. The FOMT()
function implements a routine for the automatic selection of the approximate initial values of model parameters which is based on the analysis of actual experimental data (see par_est_FOMT()
). However, if FOMT()
fails to converge, it is possible to input manually the initial parameter values using the function FOMTm()
as RHS of the formula in the stats::nls()
function:
dat <- data.frame( time = c(0, 1, 2, 3, 4, 5), conc = c(1, 0.99, 0.98, 0.5, 0.24, 0.12) ) try(FOMT(dat)) nls(conc ~ FOMTm(time, k, n), data = list( conc = dat$conc, time = dat$time ), start = list(k = 1, n = 12) )
chemdeg
package contains some simulated and actual experimental data that can be used for training purposes.
urfa
The urfa
data concern the normalized degradation kinetics of ascorbic acid in Urfa peppers dehydrated with hot air at 55, 65 and 75°C (Dağhan et al., 2018). urfa
returns a data-frame with 4 columns. The first is time and the following are the degradation data measured at different temperatures. Data in this format are not readable from the det_order()
function (see [Package usage]):
The data-frame must be transformed into a data-frame with two columns (or three if the experimental error is available), the first containing the time data and the second the measured concentration of the compound (note that the data must not be necessarily normalized with respect to the concentration of the molecule at time 0):
try(det_order(urfa)) urfa1 <- data.frame(urfa$time_min, urfa$AA_55) ord.urfa.1 <- det_order(urfa1)
The function returns a message which informs the user about the best ordinary model that can explain the data, an order 2 degradation kinetic model in this case. the function results()
provides more information:
results(ord.urfa.1)
first it returns the estimate of the order n along with its confidence interval. We remind that the algorithm automatically chooses the integer value (if present within the confidence interval) that is the nearest to the estimate of n. For example, if the estimate is 1.6 and the CI is 0.8-2.4, the value taken would be 2. This might nonetheless be not the right value. For example the scientific literature or any precious information might suggest that a reaction model of order equal to 1 (which is inside the CI) could explain the data. In this case both alternative models should be used to fit the data and the results compared. The results()
function, however, returns an explanation why a certain value of n was chosen.
Once the order of reaction is estimated to be 2 non-linear regression of the data with the appropriate model is performed and the estimated value of the rate constant is k and of its confidence interval is given. Finally, goodness-of-fit statistics are showed for non-linear regressions (note that this holds for reaction orders $\ne 0$).
A call to the function plot_ord()
allows to visualize the results:
plot_ord(ord.urfa.1)
the plot on the left side represents the phase space of the system. On the right side, the kinetic data are plotted together with the model's regression curve.
The results show that a 2nd-order degradation model best explains the data. The following example, however, shows that it is not always possible to proceed straightforwardly as in the present case and that caution is recommended before drawing conclusions about the appropriate degradation kinetic model that best fit experimental data.
In this case we load data fomtdata
from the package, that concerns the degradation kinetics of a 1.2 mM solution of 5-caffeoylquinic acid (5-CQA) in the presence of 1.2 mM ascorbic acid (Yusaku and Kuniyo, 2013).
fomtdata
In this case the data are ready to be passed to the det_order()
function:
ord.cqa <- det_order(fomtdata)
Indeed, the results are:
results(ord.cqa)
In this case the software suggests that a 0-order model might actually fit the data. However a plot of the results indicates that a 0-order model is actually not suitable to explain the experimental data:
plot_ord(ord.cqa)
Please remind that the kin_regr()
function returns the kinetic regression on kinetic data. Information can be retrieved to eventually check if the regression is indeed reliable:
lin <- kin_regr(ord.cqa) summary(lin)
However, the data in the previous plot show a time-dependent trend that is reminiscent of the graph of the nonlinear FOMT function. The functionFOMT()
permits to fit the data with the same name model:
FOMT(fomtdata)
It is possible that the function FOMT()
could not be able to converge. This is because initial parameter values automatically chosen by the program are not close enough to the true ones (for more information see stats::nls()
from package stats
). To perform the non-linear regression a manual input of the initial parameter value is required using the functionFOMTm()
as RHS of the formula:
regr.FOMT <- nls(y ~ FOMTm(t, k, n), data = list(y = fomtdata$tCQA_AA, t = fomtdata$time_h), start = list(n = 10, k = 0.05) ) summary(regr.FOMT)
The following plot compares the results from both linear (in red color) and non-linear regression analysis of the experimental data:
plot(fomtdata$time_h, fomtdata$tCQA_AA, xlab = "time (h)", ylab = "C/C0" ) new_t <- seq(0, max(fomtdata$time_h), length.out = 100) lines(new_t, predict(regr.FOMT, newdata = list(t = new_t))) lines(fomtdata$time_h, predict(lin), col = "red")
The FOMT model fits the data much better than the linear model.
goodness_of_fit(regr.FOMT)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.