| fit_gam | R Documentation |
Fits a (penalized) basis splines curve through a set of ordered pair
retention times, modeling one set of retention times (rty) as a function
on the other set (rtx). Outlier filtering iterations are performed first,
then with the remaining points, the best value of parameter k is
selected through 10-fold cross validation.
fit_gam(
object,
useID = FALSE,
k = seq(10, 20, 2),
iterFilter = 2,
outlier = c("MAD", "boxplot"),
coef = 2,
prop = 0.5,
weights = 1,
bs = c("bs", "ps"),
m = c(3, 2),
family = c("scat", "gaussian"),
method = "REML",
rtx = c("min", "max"),
rty = c("min", "max"),
optimizer = "newton",
message = TRUE,
...
)
object |
a |
useID |
logical. If set to TRUE, matched ID anchors detected from previous step will never be flagged as outliers. |
k |
integer k values controlling the dimension of the basis of the GAM fit (see: ?mgcv::s). Best value chosen by 10-fold cross validation. |
iterFilter |
integer number of outlier filtering iterations to perform |
outlier |
Thresholding method for outlier dection. If "MAD", the
threshold is the mean absolute deviation (MAD) times |
coef |
numeric (> 1) multiplier for determining thresholds for outliers
(see |
prop |
numeric. A point is excluded if deemed a residual in more than this proportion of fits. Must be between 0 & 1. |
weights |
Optional user supplied weights for each ordered pair. Must be of length equal to number of anchors (n) or a divisor of (n + 2). |
bs |
character. Choice of spline method from mgcv, either "bs" (basis splines) or "ps" (penalized basis splines) |
m |
integer. Basis and penalty order for GAM; see ?mgcv::s |
family |
character. Choice of mgcv family; see: ?mgcv::family.mgcv |
method |
character smoothing parameter estimation method; see: ?mgcv::gam |
rtx |
ordered pair of endpoints for rtx; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rtx |
rty |
ordered pair of endpoints for rty; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rty |
optimizer |
character. Method to optimize smoothing parameter; see: ?mgcv::gam |
message |
Option to print message indicating function progress |
... |
Other arguments passed to |
A set of ordered pair retention times must be previously computed using
selectAnchors(). The minimum and maximum retention times from both
input datasets are included in the set as ordered pairs (min_rtx, min_rty)
& (max_rtx, max_rty). The weights argument initially determines the
contribution of each point to the model fits; they are equally weighed by
default, but can be changed using an n+2 length vector, where n is
the number of ordered pairs and the first and last of the weights determines
the contribution of the min and max ordered pairs; by default, all weights
are initially set to 1 for equal contribution of each point.
The model complexity is determined by k. Multiple values of k are
allowed, with the best value chosen by 10 fold cross validation. Before
this happens, certain ordered pairs are removed based on the model errors.
In each iteration, a GAM is fit using each selected value of k. Depending on
the outlier argument, a point is "removed" from the model (i.e. its
corresponding weight set to 0) if its residual is above the threshold
for a proportion of fitted models, as determined by prop. If an anchor
is an "identity" (idx = idy, detected in the selectAnchors by setting
useID to TRUE), then setting useID here prevents its removal.
Other arguments, e.g. family, m, optimizer, bs,
and method are GAM specific parameters from the mgcv R package.
The family option is currently limited to the "scat" (scaled t) and
"gaussian" families; scat family model fits are more robust to outliers than
gaussian fits, but compute much slower. Type of splines are currently limited
to basis splines ("bs" or "ps").
metabCombiner with a fitted GAM model object
selectAnchors,fit_loess,
data(plasma30)
data(plasma20)
p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb = metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)
p.comb = selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
anchors = getAnchors(p.comb)
#version 1: using faster, but less robust, gaussian family
p.comb = fit_gam(p.comb, k = c(10,12,15,17,20), prop = 0.5,
family = "gaussian", outlier = "MAD", coef = 2)
#version 2: using slower, but more robust, scat family
p.comb = fit_gam(p.comb, k = seq(12,20,2), family = "scat",
iterFilter = 1, coef = 3, method = "GCV.Cp")
#version 3 (with identities)
p.comb = selectAnchors(p.comb, useID = TRUE)
anchors = getAnchors(p.comb)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2), iterFilter = 1)
#version 4 (using identities and weights)
weights = ifelse(anchors$labels == "I", 2, 1)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2),
iterFilter = 1, weights = weights)
#version 5 (using boxplot-based outlier detection
p.comb = fit_gam(p.comb, k = seq(12,20,2), outlier = "boxplot", coef = 1.5)
#to preview result of fit_gam
plot(p.comb, pch = 19, outlier = "h", xlab = "CHEAR Plasma (30 min)",
ylab = "Red-Cross Plasma (20 min)", main = "Example GAM Fit")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.