fit_gam | R Documentation |
Fits a (penalized) basis splines curve through a set of ordered pair
retention times, modeling one set of retention times (rty) as a function
on the other set (rtx). Outlier filtering iterations are performed first,
then with the remaining points, the best value of parameter k
is
selected through 10-fold cross validation.
fit_gam(
object,
useID = FALSE,
k = seq(10, 20, 2),
iterFilter = 2,
outlier = c("MAD", "boxplot"),
coef = 2,
prop = 0.5,
weights = 1,
bs = c("bs", "ps"),
m = c(3, 2),
family = c("scat", "gaussian"),
method = "REML",
rtx = c("min", "max"),
rty = c("min", "max"),
optimizer = "newton",
message = TRUE,
...
)
object |
a |
useID |
logical. If set to TRUE, matched ID anchors detected from previous step will never be flagged as outliers. |
k |
integer k values controlling the dimension of the basis of the GAM fit (see: ?mgcv::s). Best value chosen by 10-fold cross validation. |
iterFilter |
integer number of outlier filtering iterations to perform |
outlier |
Thresholding method for outlier dection. If "MAD", the
threshold is the mean absolute deviation (MAD) times |
coef |
numeric (> 1) multiplier for determining thresholds for outliers
(see |
prop |
numeric. A point is excluded if deemed a residual in more than this proportion of fits. Must be between 0 & 1. |
weights |
Optional user supplied weights for each ordered pair. Must be of length equal to number of anchors (n) or a divisor of (n + 2). |
bs |
character. Choice of spline method from mgcv, either "bs" (basis splines) or "ps" (penalized basis splines) |
m |
integer. Basis and penalty order for GAM; see ?mgcv::s |
family |
character. Choice of mgcv family; see: ?mgcv::family.mgcv |
method |
character smoothing parameter estimation method; see: ?mgcv::gam |
rtx |
ordered pair of endpoints for rtx; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rtx |
rty |
ordered pair of endpoints for rty; if "max" or "min", gives the maximum or minimum rtx, respectively, as model endpoints for rty |
optimizer |
character. Method to optimize smoothing parameter; see: ?mgcv::gam |
message |
Option to print message indicating function progress |
... |
Other arguments passed to |
A set of ordered pair retention times must be previously computed using
selectAnchors()
. The minimum and maximum retention times from both
input datasets are included in the set as ordered pairs (min_rtx, min_rty)
& (max_rtx, max_rty). The weights
argument initially determines the
contribution of each point to the model fits; they are equally weighed by
default, but can be changed using an n+2
length vector, where n is
the number of ordered pairs and the first and last of the weights determines
the contribution of the min and max ordered pairs; by default, all weights
are initially set to 1 for equal contribution of each point.
The model complexity is determined by k
. Multiple values of k are
allowed, with the best value chosen by 10 fold cross validation. Before
this happens, certain ordered pairs are removed based on the model errors.
In each iteration, a GAM is fit using each selected value of k. Depending on
the outlier
argument, a point is "removed" from the model (i.e. its
corresponding weight set to 0) if its residual is above the threshold
for a proportion of fitted models, as determined by prop
. If an anchor
is an "identity" (idx = idy, detected in the selectAnchors
by setting
useID
to TRUE), then setting useID
here prevents its removal.
Other arguments, e.g. family
, m
, optimizer
, bs
,
and method
are GAM specific parameters from the mgcv
R package.
The family
option is currently limited to the "scat" (scaled t) and
"gaussian" families; scat family model fits are more robust to outliers than
gaussian fits, but compute much slower. Type of splines are currently limited
to basis splines ("bs" or "ps").
metabCombiner with a fitted GAM model object
selectAnchors
,fit_loess
,
data(plasma30)
data(plasma20)
p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb = metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)
p.comb = selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
anchors = getAnchors(p.comb)
#version 1: using faster, but less robust, gaussian family
p.comb = fit_gam(p.comb, k = c(10,12,15,17,20), prop = 0.5,
family = "gaussian", outlier = "MAD", coef = 2)
#version 2: using slower, but more robust, scat family
p.comb = fit_gam(p.comb, k = seq(12,20,2), family = "scat",
iterFilter = 1, coef = 3, method = "GCV.Cp")
#version 3 (with identities)
p.comb = selectAnchors(p.comb, useID = TRUE)
anchors = getAnchors(p.comb)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2), iterFilter = 1)
#version 4 (using identities and weights)
weights = ifelse(anchors$labels == "I", 2, 1)
p.comb = fit_gam(p.comb, useID = TRUE, k = seq(12,20,2),
iterFilter = 1, weights = weights)
#version 5 (using boxplot-based outlier detection
p.comb = fit_gam(p.comb, k = seq(12,20,2), outlier = "boxplot", coef = 1.5)
#to preview result of fit_gam
plot(p.comb, pch = 19, outlier = "h", xlab = "CHEAR Plasma (30 min)",
ylab = "Red-Cross Plasma (20 min)", main = "Example GAM Fit")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.