fit_gam: Fit RT Projection Model With GAMs

Description Usage Arguments Details Value See Also Examples

View source: R/fit_model.R

Description

Fits a (penalized) basis splines curve through a set of ordered pair retention times, modeling one set of retention times (rty) as a function on the other set (rtx).Filtering iterations of high residual points are performed first. Multiple acceptable values of k can be supplied used, with one value selected through 10-fold cross validation.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
fit_gam(
  object,
  useID = FALSE,
  k = seq(10, 20, by = 2),
  iterFilter = 2,
  ratio = 2,
  frac = 0.5,
  bs = c("bs", "ps"),
  family = c("scat", "gaussian"),
  weights = 1,
  m = c(3, 2),
  method = "REML",
  optimizer = "newton",
  ...
)

Arguments

object

a metabCombiner object.

useID

logical. Option to use matched IDs to inform fit

k

integer vector values controling the number of basis functions for GAM construction. Best value chosen by 10-fold cross validation.

iterFilter

integer number of residual filtering iterations to perform

ratio

numeric. A point is an outlier if the ratio of residual to mean residual of a fit exceeds this value. Must be greater than 1.

frac

numeric. A point is excluded if deemed a residual in more than this fraction value times the number of fits. Must be between 0 & 1.

bs

character. Choice of spline method from mgcv, either "bs" (basis splines) or "ps" (penalized basis splines)

family

character. Choice of mgcv family; see: ?mgcv::family.mgcv

weights

Optional user supplied weights for each ordered pair. Must be of length equal to number of anchors (n) or a divisor of (n + 2).

m

integer. Basis and penalty order for GAM; see ?mgcv::s

method

character. Smoothing parameter estimation method; see: ?mgcv::gam

optimizer

character. Method to optimize smoothing parameter; see: ?mgcv::gam

...

Other arguments passed to mgcv::gam.

Details

A set of ordered pair retention times must be previously computed using selectAnchors(). The minimum and maximum retention times from both input datasets are included in the set as ordered pairs (min_rtx, min_rty) & (max_rtx, max_rty).

The weights argument initially determines the contribution of each point to the model fits; they are equally weighed by default, but can be changed using an n+2 length vector, where n is the number of ordered pairs and the first and last of the weights determines the contribution of the min and max ordered pairs.

The model complexity is determined by k. Multiple values of k are allowed, with the best value chosen by 10 fold cross validation. Before this happens, certain ordered pairs are removed based on the model errors. In each iteration, a GAM is fit using each selected value of k. A point is "removed" (its corresponding weights value set to 0) if its residual is ratio times average residual for a fraction of fitted models, as determined by frac. If an ordered pair is an "identity" (discovered in the selectAnchors by setting the useID to TRUE), then setting useID here will prevent its removal.

Other arguments, e.g. family, m, optimizer, bs, and method are GAM specific parameters. The family option is currently limited to the "scat" (scaled t) and "gaussian" families; scat family model fits are more robust to outliers than gaussian fits, but compute much slower. Type of splines are currently limited to basis splines (bs = "bs") or penalized basis splines (bs = "ps").

Value

metabCombiner with a fitted GAM model object

See Also

selectAnchors,fit_loess,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
data(plasma30)
data(plasma20)

p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb = metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)

p.comb = selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
anchors = getAnchors(p.comb)

#version 1: using faster, but less robust, gaussian family
p.comb = fit_gam(p.comb, k = c(10,12,15,17,20), frac = 0.5,
    family = "gaussian")


#version 2: using slower, but more robust, scat family
p.comb = fit_gam(p.comb, k = seq(12,20,2), family = "scat",
                     iterFilter = 1, ratio = 3, method = "GCV.Cp")

#version 3 (with identities)
p.comb = selectAnchors(p.comb, useID = TRUE)
anchors = getAnchors(p.comb)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2), iterFilter = 1)

#version 4 (using identities and weights)
weights = ifelse(anchors$labels == "I", 2, 1)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2),
                     iterFilter = 1, weights = weights)

#version 5 (assigning weights to the boundary points
weights = c(2, rep(1, nrow(anchors)), 2)
p.comb = fit_gam(p.comb, k = seq(12,20,2), weights = weights)

#to preview result of fit_gam
plot(p.comb, xlab = "CHEAR Plasma (30 min)",
     ylab = "Red-Cross Plasma (20 min)", pch = 19,
     main = "Example fit_gam Result Fit")

metabCombiner documentation built on Dec. 10, 2020, 2 a.m.