nlsLoop is a simple R package that provides a more reproducible and reliable method of fitting non-linear regression over levels of a factor. This procedure is commonly done using nlme::nlsList, but this method only uses one set of start parameters. Because of this, often some fits do not converge on the best estimated parameters simply because the starting values are too far away from the best estimated parameter. nlsLoop improves on nlme::nlsList by allowing multiple starting values for each parameter, thereby exploring more parameter space when model fitting. The best model is chosen based on AIC score, ensuring that the same run of nlsLoop will always give the same set of parameters as long as the number of tries for each fit is large enough.

This document provides an introduction into this one key use of nlsLoop, and also demonstrates its potential for producing predictions around each fit and plotting with ggplot2.

An example non-linear model fit

This vignette will use a dataset of thermal response curves for photosynthesis and respiration of the aquatic phytoplankton Chlorella vulgaris (Padfield et. al 2016). This data represents the rate of photosynthesis and respiration at different short-term, assay temperatures (16 ºC to 46 ºC) and was done in triplicate at 5 growth temperatures (20 ºC, 23 ºC, 27 ºC, 30 ºC and 33 ºC) after both 10 (acclimation) and 100 (adaptation) generations of growth.

This gives 60 curves in total. These responses generally follow a unimodal response and there are various models that have been used to model the data. A very recent overview of these can be found here and the authors also released an R package which contains many of these formulations models (Low-Decarie et al. 2017).

I will demonstrate how the unimodal resposne can be fitted with the Sharpe-Schoolfield equation of the form:

$$log(rate) = lnc + E(\frac{1}{T_{c}} - \frac{1}{kT}) - ln(1 + e^{E_h(\frac{1}{kT_h} - \frac{1}{kT})})$$

Where $k$ is Boltzmann's constant $8.62e^{-5}$ and $T$ is temperature in Kelvin. A detailed explanation of the other parameters model can be found here(Padfield et. al 2016).

We begin with loading in the data that comes with the package.

# load in package
library(nlsLoop)

# load in data
data('Chlorella_TRC')

# look at column names
names(Chlorella_TRC)

We then need to specify the non-linear model to fit to the data.

# define the Sharpe-Schoolfield equation
schoolfield_high <- function(lnc, E, Eh, Th, temp, Tc) {
  Tc <- 273.15 + Tc
  k <- 8.62e-5
  boltzmann.term <- lnc + log(exp(E/k*(1/Tc - 1/temp)))
  inactivation.term <- log(1/(1 + exp(Eh/k*(1/Th - 1/temp))))
  return(boltzmann.term + inactivation.term)
  }

We can then run nlsLoop, as long as there is a column in the dataframe that splits each level of the factor. This column, id_col, can be either a character or a factor. It will be useful if this column includes information about the data, possibly the treatments, separated by '_'. The id_col can link the fitted model objects to the raw data, so being able to split this column back up into treatment columns is a great option.

When first running nlsLoop, it is advisable to run a single nls fit on the whole dataset to give your start parameters some ball park boundaries. If you do not then you may suffer the same problems with nlsLoop as are experienced with nlsList.

fits <- nlsLoop(ln.rate ~ schoolfield_high(lnc, E, Eh, Th, temp = K, Tc = 20),
                     data = Chlorella_TRC,
                     tries = 500,
                     id_col = 'curve_id',
                     param_bds = c(-10, 10, 0.1, 2, 0.5, 10, 285, 330),
                     r2 = 'Y',
                     supp_errors = 'Y',
                     AICc = 'Y',
                     na.action = na.omit,
                     lower = c(lnc = -10, E = 0, Eh = 0, Th = 0))

The documentation for nlsLoop can be found using ?nlsLoop. The argument param_bds sets upper and lower limits for each parameter, from which random values between these are picked based on a uniform distribution. The list of values goes through the lower and upper boundary for each parameter in turn (i.e. $lnc_{lower}$, $lnc_{lower}$, $E_{lower}$, $E_{upper}$ ...). A dataframe of start parameters is then created with nrow = tries of random draws of combinations from all of the parameter values. The model then loops through non-linear regression fitted with nlsLM for each row of starting parameters. Each time the AIC score is lower for an individual fit (indicating a better relative fit), the parameter dataframe is updated.

When running through the iterations for each level of the factor, if the AIC score does not get lower for 100 different calls of nlsLM using different start parameters, nlsLoop moves onto the next level of the factor.

nlsLoop returns an nlsLoop object that works the same way as a list. It is made up of:

# look at parameter values
head(fits$params)

# look at fits
head(fits$params)

Having all of the elements in a single nlsLoop object allows a simple plotting method to assess how good each fit is to the data. Although AIC scores pick the best model, this is relative to all the other fits and does not give any information about how well the model actually fits the data. The package includes a method of calculating a quasi-rsquared score, but use this at your own risk. Non-linear versions of rsquared do not necessarily mean the same thing as linear rsquared values do (Spiess & Neumeyer 2010). Another, and better, way of evaluating model fit is by plotting the predictions alongside the raw data.

nlsLoop provides an easy wrapper to ggplot2 to do this. Firstly lets have a look at a single level of curve_id, using plot_id_nlsLoop.

plot_id_nlsLoop(data = Chlorella_TRC, param_data = fits, id = '1')

Further to this, plot_all_nlsLoop produces a pdf with each plot on a new sheet.

plot_all_nlsLoop('path/of/where/you/want/to/save/me.pdf', data = Chlorella_TRC, param_data = fits)

Instead of using the plotting functions within the package, we can easily create plots using ggplot and the predictions part of the nlsLoop object. For example, we can plot all the curve fits split by growth temperature and acclimation vs. adaptation.

# get distinct values of process, flux and growth.temp for each value of curve_id
d_treatment <- Chlorella_TRC[,c('curve_id','process', 'growth.temp', 'flux')]
d_treatment <- d_treatment[!duplicated(d_treatment),]

# merge with predictions by curve_id
fits$predictions <- merge(fits$predictions, d_treatment, by = 'curve_id')

# plot every curve
library(ggplot2)
ggplot() +
  geom_point(aes(K - 273.15, ln.rate, col = flux), size = 2, Chlorella_TRC) +
  geom_line(aes(K - 273.15, ln.rate, col = flux, group = curve_id), alpha = 0.5, fits$predictions) +
  facet_wrap(~ growth.temp + process, labeller = labeller(.multi_line = F)) +
  scale_colour_manual(values = c('green4', 'black')) +
  theme_bw(base_size = 12, base_family = 'Helvetica') +
  ylab('log Metabolic rate') +
  xlab('Assay temperature (ºC)') +
  theme(legend.position = c(0.9, 0.15))

We can also start having a quick look at any differences between parameters by accessing the params part of the nlsLoop object. Here we will plot them pooled between fluxes.

# merge params with d_treatment by curve_id
fits$params <- merge(fits$params, d_treatment, by = 'curve_id')

library(tidyr)

gather(fits$params, 'parameter', 'value', c(lnc, E, Eh, Th)) %>%
  ggplot(., aes(flux, value, col = flux)) +
  geom_boxplot(fill = 'white', outlier.shape = NA) +
  geom_point(position = position_jitter(height = 0, width = 0.1)) +
  facet_wrap(~ parameter, scales = 'free_y') +
  scale_color_manual(values = c('green4', 'black')) +
  scale_shape_manual(values = c(21, 24)) +
  theme_bw(base_size = 12, base_family = 'Helvetica') +
  theme(legend.position = 'top')

The confidence intervals of each fit can also be calculated using confint_nlsLoop.

# calculate confidence intervals for each fit
CIs <- confint_nlsLoop(Chlorella_TRC, fits)

# bind with factors dataframe
CIs <- merge(CIs, d_treatment, by = 'curve_id')

# plot
ggplot(CIs, aes(col = flux)) +
  geom_point(aes(curve_id, mean)) +
  facet_wrap(~ param, scale = 'free_x', ncol = 4) +
  geom_linerange(aes(curve_id, ymin = CI_lwr, ymax = CI_upr)) +
  coord_flip() +
  scale_color_manual(values = c('green4', 'black')) +
  theme_bw(base_size = 12, base_family = 'Helvetica') +
  theme(legend.position = 'top') +
  xlab('curve') +
  ylab('parameter estimate')

When you are happy with your starting parameter values, nlsLoop2 can be used to speed up your model fitting by using nls2 to generate multiple starting values.

nlsLoop provides a more reliable and reproducible way of getting individual non-linear fits over levels of a factor of a dataframe. I hope this vignette has provided a good overview of its benefits and usage.

References:



padpadpadpad/nlsTools documentation built on May 24, 2019, 5:59 p.m.