fit_gam: Fit RT Projection Model With GAMs

View source: R/fit_model.R

fit_gamR Documentation

Fit RT Projection Model With GAMs

Description

Fits a (penalized) basis splines curve through a set of ordered pair retention times, modeling one set of retention times (rty) as a function on the other set (rtx). Outlier filtering iterations are performed first, then with the remaining points, the best value of parameter k is selected through 10-fold cross validation.

Usage

fit_gam(
  object,
  useID = FALSE,
  k = seq(10, 20, 2),
  iterFilter = 2,
  outlier = c("MAD", "boxplot"),
  coef = 2,
  prop = 0.5,
  weights = 1,
  bs = c("bs", "ps"),
  m = c(3, 2),
  family = c("scat", "gaussian"),
  method = "REML",
  rtx = c("min", "max"),
  rty = c("min", "max"),
  optimizer = "newton",
  message = TRUE,
  ...
)

Arguments

object

a metabCombiner object.

useID

logical. If set to TRUE, matched ID anchors detected from previous step will never be flagged as outliers.

k

integer k values controlling the dimension of the basis of the GAM fit (see: ?mgcv::s). Best value chosen by 10-fold cross validation.

iterFilter

integer number of outlier filtering iterations to perform

outlier

Thresholding method for outlier dection. If "MAD", the threshold is the mean absolute deviation (MAD) times coef; if "boxplot", the threshold is coef times IQR plus 3rd quartile of a model's absolute residual values.

coef

numeric (> 1) multiplier for determining thresholds for outliers (see outlier argument)

prop

numeric. A point is excluded if deemed a residual in more than this proportion of fits. Must be between 0 & 1.

weights

Optional user supplied weights for each ordered pair. Must be of length equal to number of anchors (n) or a divisor of (n + 2).

bs

character. Choice of spline method from mgcv, either "bs" (basis splines) or "ps" (penalized basis splines)

m

integer. Basis and penalty order for GAM; see ?mgcv::s

family

character. Choice of mgcv family; see: ?mgcv::family.mgcv

method

character smoothing parameter estimation method; see: ?mgcv::gam

rtx

ordered pair of endpoints for rtx; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rtx

rty

ordered pair of endpoints for rty; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rty

optimizer

character. Method to optimize smoothing parameter; see: ?mgcv::gam

message

Option to print message indicating function progress

...

Other arguments passed to mgcv::gam.

Details

A set of ordered pair retention times must be previously computed using selectAnchors(). The minimum and maximum retention times from both input datasets are included in the set as ordered pairs (min_rtx, min_rty) & (max_rtx, max_rty). The weights argument initially determines the contribution of each point to the model fits; they are equally weighed by default, but can be changed using an n+2 length vector, where n is the number of ordered pairs and the first and last of the weights determines the contribution of the min and max ordered pairs; by default, all weights are initially set to 1 for equal contribution of each point.

The model complexity is determined by k. Multiple values of k are allowed, with the best value chosen by 10 fold cross validation. Before this happens, certain ordered pairs are removed based on the model errors. In each iteration, a GAM is fit using each selected value of k. Depending on the outlier argument, a point is "removed" from the model (i.e. its corresponding weight set to 0) if its residual is above the threshold for a proportion of fitted models, as determined by prop. If an anchor is an "identity" (idx = idy, detected in the selectAnchors by setting useID to TRUE), then setting useID here prevents its removal.

Other arguments, e.g. family, m, optimizer, bs, and method are GAM specific parameters from the mgcv R package. The family option is currently limited to the "scat" (scaled t) and "gaussian" families; scat family model fits are more robust to outliers than gaussian fits, but compute much slower. Type of splines are currently limited to basis splines ("bs" or "ps").

Value

metabCombiner with a fitted GAM model object

See Also

selectAnchors,fit_loess,

Examples

data(plasma30)
data(plasma20)

p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb = metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)

p.comb = selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
anchors = getAnchors(p.comb)

#version 1: using faster, but less robust, gaussian family
p.comb = fit_gam(p.comb, k = c(10,12,15,17,20), prop = 0.5,
    family = "gaussian", outlier = "MAD", coef = 2)


#version 2: using slower, but more robust, scat family
p.comb = fit_gam(p.comb, k = seq(12,20,2), family = "scat",
                     iterFilter = 1, coef = 3, method = "GCV.Cp")

#version 3 (with identities)
p.comb = selectAnchors(p.comb, useID = TRUE)
anchors = getAnchors(p.comb)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2), iterFilter = 1)

#version 4 (using identities and weights)
weights = ifelse(anchors$labels == "I", 2, 1)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2),
                     iterFilter = 1, weights = weights)

#version 5 (using boxplot-based outlier detection
p.comb = fit_gam(p.comb, k = seq(12,20,2), outlier = "boxplot", coef = 1.5)

#to preview result of fit_gam
plot(p.comb, pch = 19, outlier = "h", xlab = "CHEAR Plasma (30 min)",
     ylab = "Red-Cross Plasma (20 min)", main = "Example GAM Fit")



hhabra/metabCombiner documentation built on Jan. 26, 2024, 12:23 p.m.