# s: Defining Smooths in VGAM Formulas In VGAM: Vector Generalized Linear and Additive Models

## Description

`s` is used in the definition of (vector) smooth terms within `vgam` formulas. This corresponds to 1st-generation VGAMs that use backfitting for their estimation. The effective degrees of freedom is prespecified.

## Usage

 `1` ```s(x, df = 4, spar = 0, ...) ```

## Arguments

 `x` covariate (abscissae) to be smoothed. Note that `x` must be a single variable and not a function of a variable. For example, `s(x)` is fine but `s(log(x))` will fail. In this case, let `logx <- log(x)` (in the data frame), say, and then use `s(logx)`. At this stage bivariate smoothers (`x` would be a two-column matrix) are not implemented. `df` numerical vector of length r. Effective degrees of freedom: must lie between 1 (linear fit) and n (interpolation). Thus one could say that `df-1` is the effective nonlinear degrees of freedom (ENDF) of the smooth. Recycling of values will be used if `df` is not of length r. If `spar` is positive then this argument is ignored. Thus `s()` means that the effective degrees of freedom is prespecified. If it is known that the component function(s) are more wiggly than usual then try increasing the value of this argument. `spar` numerical vector of length r. Positive smoothing parameters (after scaling) . Larger values mean more smoothing so that the solution approaches a linear fit for that component function. A zero value means that `df` is used. Recycling of values will be used if `spar` is not of length r. `...` Ignored for now.

## Details

In this help file M is the number of additive predictors and r is the number of component functions to be estimated (so that r is an element from the set {1,2,...,M}). Also, if n is the number of distinct abscissae, then `s` will fail if n < 7.

`s`, which is symbolic and does not perform any smoothing itself, only handles a single covariate. Note that `s` works in `vgam` only. It has no effect in `vglm` (actually, it is similar to the identity function `I` so that `s(x2)` is the same as `x2` in the LM model matrix). It differs from the `s()` of the gam package and the `s` of the mgcv package; they should not be mixed together. Also, terms involving `s` should be simple additive terms, and not involving interactions and nesting etc. For example, `myfactor:s(x2)` is not a good idea.

## Value

A vector with attributes that are (only) used by `vgam`.

## Note

The vector cubic smoothing spline which `s()` represents is computationally demanding for large M. The cost is approximately O(n M^3) where n is the number of unique abscissae.

Currently a bug relating to the use of `s()` is that only constraint matrices whose columns are orthogonal are handled correctly. If any `s()` term has a constraint matrix that does not satisfy this condition then a warning is issued. See `is.buggy` for more information.

A more modern alternative to using `s` with `vgam` is to use `sm.os` or `sm.ps`. This does not require backfitting and allows automatic smoothing parameter selection. However, this alternative should only be used when the sample size is reasonably large (> 500, say). These are called Generation-2 VGAMs.

Another alternative to using `s` with `vgam` is `bs` and/or `ns` with `vglm`. The latter implements half-stepping, which is helpful if convergence is difficult.

Thomas W. Yee

## References

Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.

`vgam`, `is.buggy`, `sm.os`, `sm.ps`, `vsmooth.spline`.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```# Nonparametric logistic regression fit1 <- vgam(agaaus ~ s(altitude, df = 2), binomialff, data = hunua) ## Not run: plot(fit1, se = TRUE) # Bivariate logistic model with artificial data nn <- 300 bdata <- data.frame(x1 = runif(nn), x2 = runif(nn)) bdata <- transform(bdata, y1 = rbinom(nn, size = 1, prob = logitlink(sin(2 * x2), inverse = TRUE)), y2 = rbinom(nn, size = 1, prob = logitlink(sin(2 * x2), inverse = TRUE))) fit2 <- vgam(cbind(y1, y2) ~ x1 + s(x2, 3), trace = TRUE, binom2.or(exchangeable = TRUE), data = bdata) coef(fit2, matrix = TRUE) # Hard to interpret ## Not run: plot(fit2, se = TRUE, which.term = 2, scol = "blue") ```

### Example output

```Loading required package: stats4
VGAM  s.vam  loop  1 :  deviance = 762.05425
VGAM  s.vam  loop  2 :  deviance = 758.08667
VGAM  s.vam  loop  3 :  deviance = 758.07384
VGAM  s.vam  loop  4 :  deviance = 758.07371
VGAM  s.vam  loop  5 :  deviance = 758.07371