nlsLoop is a simple R package that provides a more reproducible and reliable method of fitting non-linear regression over levels of a factor. This procedure is commonly done using nlme::nlsList, but this method only uses one set of start parameters. Because of this, often some fits do not converge on the best estimated parameters simply because the starting values are too far away from the best estimated parameter. nlsLoop improves on nlme::nlsList by allowing multiple starting values for each parameter, thereby exploring more parameter space when model fitting. The best model is chosen based on AIC score, ensuring that the same run of nlsLoop will always give the same set of parameters as long as the number of tries for each fit is large enough.
This document provides an introduction into this one key use of nlsLoop, and also demonstrates its potential for producing predictions around each fit and plotting with ggplot2.
This vignette will use a dataset of thermal response curves for photosynthesis and respiration of the aquatic phytoplankton Chlorella vulgaris (Padfield et. al 2016). This data represents the rate of photosynthesis and respiration at different short-term, assay temperatures (16 ºC to 46 ºC) and was done in triplicate at 5 growth temperatures (20 ºC, 23 ºC, 27 ºC, 30 ºC and 33 ºC) after both 10 (acclimation) and 100 (adaptation) generations of growth.
This gives 60 curves in total. These responses generally follow a unimodal response and there are various models that have been used to model the data. A very recent overview of these can be found here and the authors also released an R package which contains many of these formulations models (Low-Decarie et al. 2017).
I will demonstrate how the unimodal resposne can be fitted with the Sharpe-Schoolfield equation of the form:
$$log(rate) = lnc + E(\frac{1}{T_{c}} - \frac{1}{kT}) - ln(1 + e^{E_h(\frac{1}{kT_h} - \frac{1}{kT})})$$
Where $k$ is Boltzmann's constant $8.62e^{-5}$ and $T$ is temperature in Kelvin. A detailed explanation of the other parameters model can be found here(Padfield et. al 2016).
We begin with loading in the data that comes with the package.
# load in package library(nlsLoop) # load in data data('Chlorella_TRC') # look at column names names(Chlorella_TRC)
We then need to specify the non-linear model to fit to the data.
# define the Sharpe-Schoolfield equation schoolfield_high <- function(lnc, E, Eh, Th, temp, Tc) { Tc <- 273.15 + Tc k <- 8.62e-5 boltzmann.term <- lnc + log(exp(E/k*(1/Tc - 1/temp))) inactivation.term <- log(1/(1 + exp(Eh/k*(1/Th - 1/temp)))) return(boltzmann.term + inactivation.term) }
We can then run nlsLoop, as long as there is a column in the dataframe that splits each level of the factor. This column, id_col, can be either a character or a factor. It will be useful if this column includes information about the data, possibly the treatments, separated by '_'. The id_col can link the fitted model objects to the raw data, so being able to split this column back up into treatment columns is a great option.
When first running nlsLoop, it is advisable to run a single nls fit on the whole dataset to give your start parameters some ball park boundaries. If you do not then you may suffer the same problems with nlsLoop as are experienced with nlsList.
fits <- nlsLoop(ln.rate ~ schoolfield_high(lnc, E, Eh, Th, temp = K, Tc = 20), data = Chlorella_TRC, tries = 500, id_col = 'curve_id', param_bds = c(-10, 10, 0.1, 2, 0.5, 10, 285, 330), r2 = 'Y', supp_errors = 'Y', AICc = 'Y', na.action = na.omit, lower = c(lnc = -10, E = 0, Eh = 0, Th = 0))
The documentation for nlsLoop can be found using ?nlsLoop
. The argument param_bds
sets upper and lower limits for each parameter, from which random values between these are picked based on a uniform distribution. The list of values goes through the lower and upper boundary for each parameter in turn (i.e. $lnc_{lower}$, $lnc_{lower}$, $E_{lower}$, $E_{upper}$ ...). A dataframe of start parameters is then created with nrow = tries
of random draws of combinations from all of the parameter values. The model then loops through non-linear regression fitted with nlsLM for each row of starting parameters. Each time the AIC score is lower for an individual fit (indicating a better relative fit), the parameter dataframe is updated.
When running through the iterations for each level of the factor, if the AIC score does not get lower for 100 different calls of nlsLM using different start parameters, nlsLoop moves onto the next level of the factor.
nlsLoop returns an nlsLoop
object that works the same way as a list. It is made up of:
# look at parameter values head(fits$params) # look at fits head(fits$params)
Having all of the elements in a single nlsLoop object allows a simple plotting method to assess how good each fit is to the data. Although AIC scores pick the best model, this is relative to all the other fits and does not give any information about how well the model actually fits the data. The package includes a method of calculating a quasi-rsquared score, but use this at your own risk. Non-linear versions of rsquared do not necessarily mean the same thing as linear rsquared values do (Spiess & Neumeyer 2010). Another, and better, way of evaluating model fit is by plotting the predictions alongside the raw data.
nlsLoop provides an easy wrapper to ggplot2 to do this. Firstly lets have a look at a single level of curve_id
, using plot_id_nlsLoop.
plot_id_nlsLoop(data = Chlorella_TRC, param_data = fits, id = '1')
Further to this, plot_all_nlsLoop produces a pdf with each plot on a new sheet.
plot_all_nlsLoop('path/of/where/you/want/to/save/me.pdf', data = Chlorella_TRC, param_data = fits)
Instead of using the plotting functions within the package, we can easily create plots using ggplot and the predictions
part of the nlsLoop object. For example, we can plot all the curve fits split by growth temperature and acclimation vs. adaptation.
# get distinct values of process, flux and growth.temp for each value of curve_id d_treatment <- Chlorella_TRC[,c('curve_id','process', 'growth.temp', 'flux')] d_treatment <- d_treatment[!duplicated(d_treatment),] # merge with predictions by curve_id fits$predictions <- merge(fits$predictions, d_treatment, by = 'curve_id') # plot every curve library(ggplot2) ggplot() + geom_point(aes(K - 273.15, ln.rate, col = flux), size = 2, Chlorella_TRC) + geom_line(aes(K - 273.15, ln.rate, col = flux, group = curve_id), alpha = 0.5, fits$predictions) + facet_wrap(~ growth.temp + process, labeller = labeller(.multi_line = F)) + scale_colour_manual(values = c('green4', 'black')) + theme_bw(base_size = 12, base_family = 'Helvetica') + ylab('log Metabolic rate') + xlab('Assay temperature (ºC)') + theme(legend.position = c(0.9, 0.15))
We can also start having a quick look at any differences between parameters by accessing the params
part of the nlsLoop object. Here we will plot them pooled between fluxes.
# merge params with d_treatment by curve_id fits$params <- merge(fits$params, d_treatment, by = 'curve_id') library(tidyr) gather(fits$params, 'parameter', 'value', c(lnc, E, Eh, Th)) %>% ggplot(., aes(flux, value, col = flux)) + geom_boxplot(fill = 'white', outlier.shape = NA) + geom_point(position = position_jitter(height = 0, width = 0.1)) + facet_wrap(~ parameter, scales = 'free_y') + scale_color_manual(values = c('green4', 'black')) + scale_shape_manual(values = c(21, 24)) + theme_bw(base_size = 12, base_family = 'Helvetica') + theme(legend.position = 'top')
The confidence intervals of each fit can also be calculated using confint_nlsLoop.
# calculate confidence intervals for each fit CIs <- confint_nlsLoop(Chlorella_TRC, fits) # bind with factors dataframe CIs <- merge(CIs, d_treatment, by = 'curve_id') # plot ggplot(CIs, aes(col = flux)) + geom_point(aes(curve_id, mean)) + facet_wrap(~ param, scale = 'free_x', ncol = 4) + geom_linerange(aes(curve_id, ymin = CI_lwr, ymax = CI_upr)) + coord_flip() + scale_color_manual(values = c('green4', 'black')) + theme_bw(base_size = 12, base_family = 'Helvetica') + theme(legend.position = 'top') + xlab('curve') + ylab('parameter estimate')
When you are happy with your starting parameter values, nlsLoop2 can be used to speed up your model fitting by using nls2 to generate multiple starting values.
nlsLoop provides a more reliable and reproducible way of getting individual non-linear fits over levels of a factor of a dataframe. I hope this vignette has provided a good overview of its benefits and usage.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.