qfa.fit2: Growth curve modelling

View source: R/ModelFits.R

qfa.fit2R Documentation

Growth curve modelling

Description

This function will fit the specified growth model to timecourse observations by least squares using either the L-BFGS-B algorithm in R's optim function, or the differential evolution, stochastic global optimisation package DEoptim. The input needs to be in the correct format preferably created with either the colonyzer.read or GC2QFA function. The user can specify to use one of the three following growth models with the argument Model: Standard logistic model (Slog), Generalized linear model (Glog) or Gompertz growth model (Gmp). qfa.fit2 will also calculate a numerical Area Under Curve (nAUC) fitness measure by integrating under a loess smooothed version of the dataset if there are sufficient observations or under a linear interpolation between observations if observations are too infrequent.

Usage

qfa.fit2(
  d,
  Model = "Gmp",
  TimeFormat = "d",
  inocguess = NULL,
  fmt = "%Y-%m-%d_%H-%M-%S",
  minK = 0.025,
  detectThresh = 5e-04,
  globalOpt = FALSE,
  logTransform = FALSE,
  fixG = TRUE,
  AUCLim = NA,
  STP = 20,
  nCores = 1,
  modelFit = TRUE,
  checkSlow = F,
  nrate = T,
  lowK = NA,
  upK = NA,
  lowr = NA,
  upr = NA,
  lowg = NA,
  upg = NA,
  lowv = NA,
  upv = NA,
  lowb = NA,
  upb = NA,
  ...
)

Arguments

d

The data.frame containing the timecourse data for each colony (returned from colonyzer.read or GC2QFA).

Model

Either "Slog" for standard logistic model, "Glog" for generalized logistic model or "Gmp" for Gompertz model. The parameters returned by qfa.fit2 will change corresponding to the chosen model. The Gmp variation used is formula (22) from Tjorve and Tjorve 2017 (PLOS ONE) which is a unified version with an absolute growth rate

TimeFormat

Either set to hours (h) or days (d). Only defines the format of the outputs in growth rates and for subsequent plotting.

inocguess

Only relevant for Slog and Glog. Should be either numerical or NULL. The best guess for starting density of viable cells in each colony. This is the g parameter in the Slog and Glog. Typically, for dilute inoculum 384 format spotted cultures, this value cannot be observed directly by photography. inocguess should be in the same units as the values in the Growth column in d. If fixG=TRUE, only values of g within the range 0.9*inocguess and 1.1*inocguess will be assessed during optimisation. Otherwise values within 1e-10*inocguess and 1e+10*inocguess will be tried. Without a sensible independent estimate for inoculum density, the best we can do is to estimate it based on observed data. Estimating inocguess happens if inocguess is set to NULL. Estimating inoculum density will only work well if the inoculum density is high enough to be measurable (e.g. pinned cultures or conc. spotted) and is clearly observed. Clearly observed means: no condensation on plates immediately after they are placed in incubator for example. If we are making an independent estimate of inoculum density, then we should also reset the time at which the experiment "begins". This experiment start time should be the time at which the inoculum density is observed.

fmt

The date.time format that the inoculation time (Inoc.Time) and measurement times (Date.Time) are stored in. Default to " which is the output format of Colonyzer.

minK

The minimum value of K above which a strain is said to be alive. Strains with K optimised to lie below this value will be classified as dead, by setting r to be zero.

detectThresh

The minimum detectable cell density (or Growth value) which reliably identifies the presence of cells. Cell densities below this value are classified as noise and repalced with the detectThresh value. Can also be set =0. Then, the software is trying to estimate a threshold as the mean between the two smallest Growth values per position.

globalOpt

logical. Indicates whether qfa.fit2 should use the slower, but more robust DEoptim global optimisation functions to fit the growth model to the data, or the quicker optim function.

logTransform

logical. Indicating if data should be log-transformed before model fit. Not recommended.

fixG

logical. Only relevant for Slog and Glog. Indicates whether to allow g parameter to vary over a wide 1e-10*inocguess to 1e+10*inocguess or narrow range 0.9*inocguess to 1.1*inocguess during optimisation. fixG=TRUE corresponds to narrow constraints on g.

AUCLim

Numerical AUC (nAUC) is calculated as the integral of an approximation of the growth curve between time 0 and AUCLim. If set to NA (default), AUClim will be set to the maximum time in the dataset.

STP

Time to use for "Single Timepoint" fitness estimate. Defaults to 20 days (very late in growth curve) which is like carrying capacity. Untested functionality of the first QFA package.

nCores

Can attempt to split model fitting load across multiple parallel cores. Experimental, probably best to leave this value set to default (1).

modelFit

logical. Specifies whether to carry out any model fitting at all. When set to FALSE, only numerical fitness estimates such as nr, nMDP, nAUC are generated

checkSlow

logical. Specifies whether to re-optimise curve-fitting for slow-growing strains. If TRUE, slow-growing or dead strains are identified heuristically and a second round of curve fitting using global (but slower) optimisation is carried out. Heuristic identification of slow-growing strains is currently experimental, it seems we have over-tuned these to datasets we capture at Newcastle. If you notice a banding pattern in your MDR or r fitness distributions, please set checkSlow to FALSE.

nrate

Boolean specifiying whether to include numerical fitness estimates like maximum growth rate and time to reach this maximum growth rate. These estimates are derived from model-free numerical integration of the data

lowK, upK, etc

Set the lower and upper boundaries of the optimisation algorithm for the corresponding model parameters. Parameter possibilites: K, r, g, b, v

...

Extra arguments passed to optim

Value

R data.frame, similar to that returned by the colonyzer.read function. The major difference is that instead of a row for every cell density observation for every culture, this object summarises all timecourse density observations for each culture with fitted grwoth model parameters and numerical fitness estimates.

  • Barcode - Unique plate identifier

  • Row - Row number (counting from top of image) of culture in rectangular gridded array

  • Col - Column number (counting from left of image) of culture in rectangular gridded array

  • ScreenID - Unique identifier for this QFA screen

  • Treatment - Conditions applied externally to plates (e.g. temperature(s) at which cultures were grown, UV irradiation applied, etc.)

  • Medium - Nutrients/drugs in plate agar

  • ORF - Systematic, unique identifier for genotype in this position in arrayed library

  • Screen.Name - Name of screen (identifies biological repeats, and experiment)

  • Library.Name - Name of library, specifying particular culture location

  • MasterPlate Number - Library plate identifier

  • Timeseries order - Sequential photograph number

  • Inoc.Time - User specified date and time of inoculation (specified in ExptDescription.txt file)

  • TileX - Culture tile width (pixels)

  • TileY - Culture tile height (pixels)

  • XOffset - x-coordinate of top left corner of rectangular tile bounding culture (number of pixels from left of image)

  • YOffset - y-coordinate of top left corner of rectangular tile bounding culture (number of pixels from top of image)

  • Threshold - Global pixel intensity threshold used for image segmentation (after lighting correction)

  • EdgeLength - Number of culture pixels classified as being microcolony edge pixels (useful for classifying contaminants in cultures grown from dilute inoculum)

  • EdgePixels - Number of pixels classified as culture on edge of square tile

  • RepQuad - Integer identifying which of the quadrants of a 1536 plate were used to inoculate the current 384 plate (set equal to 1 for all cultures for 1536 format for example)

  • K - carrying capacity (upper asymptote) for all models

  • r - Rate parameters (Slog and Glog), maximum absolute growth rate (Gmp)

  • g - For Slog and Glog: inoculum density (lower asymptote) (referred to in vignette as g_0). For Gmp: Calculated based on the three Gmp parameters

  • v - Only for Slog and Glog: Generalised logistic model shape parameter (=1 for Slog)

  • b Only for Gmp: Time to reach max growth rate (r).

  • yshift Only for Gmp: Min growth (data is shifted down by that amount for Gmp fit)

  • objval - Objective function value (sum of squares) at selected optimum.

  • tshift - Only for Slog and Glog: Shift applied to observation times before fitting logistic model (need to apply same shift before overlaying curve on expt. obs.). Set to first timepoint at which growth value is equal or above to detectThresh

  • t0 - Time of first detectable cell density observation (i.e. above detectThresh)

  • d0 - Normalised cell density of first observation (be careful about condensation on plates when using this). Note this is not necessarily the density at t0.

  • nAUC - Numerical Area Under Curve. This is a model-free fitness estimate.

  • nSTP - Single Time Point fitness. Cell density at time STP, as estimated with approximating function. This is a model-free fitness estimate.

  • nr - Numerical estimate of intrinsic growth rate. Growth rate estimated by fitting smoothing function to log of data, calculating numerical slope estimate across range of data and selecting the maximum estimate (should occur during exponential phase).

  • nr_t - Time at which maximum slope of log observations occurs

  • maxslp - Numerical estimate of maximum slope of growth curve. Slope estimated by fitting smoothing function to untransformed data and calculating numerical slope estimate of smoothed version of data and selecting the maximum estimate (should occur approximately half way through growth). This fitness measure will be affected by both rate of growth and final colony size. Final colony size is expected to be strongly affected by competition between cultures.

  • maxslp_t - Time at which maximum slope of observations occurs

  • ExptDate - A representative/approximate date for the experiment (note that genome-wide QFA screens typically take weeks to complete)

  • User - Person who actually carried out screen

  • PI - Principal investigator leading project that screen is part of

  • Condition - The most important defining characteristic of screen, as specified by user (e.g. the temperature screen was carried out at if screen is part of multi-temperature set of screens, or the query mutation if part of a set of screens comparing query mutations, or the drugs present in the medium if part of a set of drug screens)

  • Inoc - Qualitative identifier of inoculation type (e.g. "DIL" for dilute inoculum, "CONC" for concentrated). Used to distinguish between experiments carried out with different methods of inoculation.

  • Gene - Identifier for genotype at a particular location on an agar plate. Typically prefer unambiguous, systematic gene names here.

  • TrtMed - Combination of treatment and medium identifiers, specifying the environment in which the cells have grown

Examples

data(qfa.testdata)
#Strip non-experimental edge cultures
qfa.testdata = qfa.testdata[(qfa.testdata$Row!=1) & (qfa.testdata$Col!=1) & (qfa.testdata$Row!=8) & (qfa.testdata$Col!=12),]
# Define which measure of cell density to use
qfa.testdata$Growth = qfa.testdata$Intensity
GmpFit = qfa.fit2(qfa.testdata, inocguess=NULL, detectThresh=0, globalOpt=F, AUCLim=NA, TimeFormat="h", Model="Gmp")
# Construct fitness measures
GmpFit = makeFitness2(GmpFit, AUCLim=NA, plotFitness="All", filename="Example_Gmp_fitness.pdf")
# Create plot
qfa.plot2("Example_Gmp_GrowthCurves.pdf", GmpFit, qfa.testdata, maxt=30)

JulBaer/baQFA documentation built on Feb. 19, 2023, 10:32 p.m.