Predict concentration using a calibration line (or curve) and inverse regression.
1 2 3 4
an object of class
optional numeric vector of observed values for the machine signal.
The default value is
optional integer indicating the number of points at which to predict concentrations
(i.e., perform inverse regression). The default value is
optional logical scalar indicating whether to compute confidence intervals for
the predicted concentrations. The default value is
optional numeric scalar between 0 and 1 indicating the confidence level associated with
the confidence intervals for the predicted concentrations.
The default value is
optional logical scalar indicating whether to base the confidence intervals
for the predicted values on simultaneous or non-simultaneous prediction limits.
The default value is
optional logical scalar indicating whether to base the confidence intervals for the predicted values
on prediction limits for the mean (
optional logical scalar indicating whether to print out (trace) the progress of
the inverse prediction for each of the specified values of
A simple and frequently used calibration model is a straight line where the response variable S denotes the signal of the machine and the predictor variable C denotes the true concentration in the physical sample. The error term is assumed to follow a normal distribution with mean 0. Note that the average value of the signal for a blank (C = 0) is the intercept. Other possible calibration models include higher order polynomial models such as a quadratic or cubic model.
In a typical setup, a small number of samples (e.g., n = 6) with known concentrations are measured and the signal is recorded. A sample with no chemical in it, called a blank, is also measured. (You have to be careful to define exactly what you mean by a “blank.” A blank could mean a container from the lab that has nothing in it but is prepared in a similar fashion to containers with actual samples in them. Or it could mean a field blank: the container was taken out to the field and subjected to the same process that all other containers were subjected to, except a physical sample of soil or water was not placed in the container.) Usually, replicate measures at the same known concentrations are taken. (The term “replicate” must be well defined to distinguish between for example the same physical samples that are measured more than once vs. two different physical samples of the same known concentration.)
calibrate initially fits a linear calibration
line or curve. Once the calibration line is fit, samples with unknown
concentrations are measured and their signals are recorded. In order to
produce estimated concentrations, you have to use inverse regression to
map the signals to the estimated concentrations. We can quantify the
uncertainty in the estimated concentration by combining inverse regression
with prediction limits for the signal S.
A numeric matrix containing the results of the inverse calibration.
The first two columns are labeled
the values of the argument
obs.y and the predicted values of
(the concentration), respectively. If
intervals=TRUE, then the matrix also
contains the columns
upl.x corresponding to the lower and
upper prediction limits for
x. Also, if
intervals=TRUE, then the
matrix has the attributes
coverage (the value of the argument
simultaneous (the value of the argument
Almost always the process of determining the concentration of a chemical in
a soil, water, or air sample involves using some kind of machine that
produces a signal, and this signal is related to the concentration of the
chemical in the physical sample. The process of relating the machine signal
to the concentration of the chemical is called calibration
calibrate). Once calibration has been performed,
estimated concentrations in physical samples with unknown concentrations
are computed using inverse regression. The uncertainty in the process used
to estimate the concentration may be quantified with decision, detection,
and quantitation limits.
In practice, only the point estimate of concentration is reported (along with a possible qualifier), without confidence bounds for the true concentration C. This is most unfortunate because it gives the impression that there is no error associated with the reported concentration. Indeed, both the International Organization for Standardization (ISO) and the International Union of Pure and Applied Chemistry (IUPAC) recommend always reporting both the estimated concentration and the uncertainty associated with this estimate (Currie, 1997).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Currie, L.A. (1997). Detection: International Update, and Some Emerging Di-Lemmas Involving Calibration, the Blank, and Multiple Detection Decisions. Chemometrics and Intelligent Laboratory Systems 37, 151–181.
Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York, Chapter 3 and p.335.
Hubaux, A., and G. Vos. (1970). Decision and Detection Limits for Linear Calibration Curves. Annals of Chemistry 42, 849–855.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.562–575.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
# The data frame EPA.97.cadmium.111.df contains calibration data # for cadmium at mass 111 (ng/L) that appeared in # Gibbons et al. (1997b) and were provided to them by the U.S. EPA. # Here we # 1. Display a plot of these data along with the fitted calibration # line and 99% non-simultaneous prediction limits. # 2. Then based on an observed signal of 60 from a sample with # unknown concentration, we use the calibration line to estimate # the true concentration and use the prediction limits to compute # confidence bounds for the true concentration. # An observed signal of 60 results in an estimated value of cadmium # of 59.97 ng/L and a confidence interval of [53.83, 66.15]. # See Millard and Neerchal (2001, pp.566-569) for more details on # this example. Cadmium <- EPA.97.cadmium.111.df$Cadmium Spike <- EPA.97.cadmium.111.df$Spike calibrate.list <- calibrate(Cadmium ~ Spike, data = EPA.97.cadmium.111.df) newdata <- data.frame(Spike = seq(min(Spike), max(Spike), length.out = 100)) pred.list <- predict(calibrate.list, newdata = newdata, se.fit = TRUE) pointwise.list <- pointwise(pred.list, coverage = 0.99, individual = TRUE) plot(Spike, Cadmium, ylim = c(min(pointwise.list$lower), max(pointwise.list$upper)), xlab = "True Concentration (ng/L)", ylab = "Observed Concentration (ng/L)") abline(calibrate.list, lwd=2) lines(newdata$Spike, pointwise.list$lower, lty=8, lwd=2) lines(newdata$Spike, pointwise.list$upper, lty=8, lwd=2) title(paste("Calibration Line and 99% Prediction Limits", "for US EPA Cadmium 111 Data", sep = "\n")) # Now estimate the true concentration based on # an observed signal of 60 ng/L. inversePredictCalibrate(calibrate.list, obs.y = 60, intervals = TRUE, coverage = 0.99, individual = TRUE) # obs.y pred.x lpl.x upl.x #[1,] 60 59.97301 53.8301 66.15422 #attr(, "coverage"): # 0.99 #attr(, "simultaneous"): # FALSE rm(Cadmium, Spike, calibrate.list, newdata, pred.list, pointwise.list)