sm.os: Defining O'Sullivan Spline Smooths in VGAM Formulas

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/sm.os.R

Description

This function represents an O-spline smooth term in a vgam formula and confers automatic smoothing parameter selection.

Usage

1
2
3
4
sm.os(x, ..., niknots = 6, spar = -1, o.order = 2,
      alg.niknots = c("s", ".nknots.smspl")[1], all.knots = FALSE,
      ridge.adj = 1e-5, spillover = 0.01, maxspar = 1e12,
      outer.ok = FALSE, fixspar = FALSE)

Arguments

x

covariate (abscissae) to be smoothed. Also called the regressor. If the xij facility is used then these covariates are inputted via the ... argument.

...

Used to accommodate the other M-1 covariates when the xij facility is used. See Section 3.4.4 of Yee (2015) for something very similar. This argument, found in the second argument, means that the other argument names must be fully specified if used, e.g., outer.ok and not outer. See the example below. In the example below, the term in the main formula is sm.os(gcost.air, gcost.trn, gcost.bus) and one might be tempted to use something like sm.os(gcost) to represent that xij term. However, this is not recommended because sm.os(gcost) might not have the same number of columns as sm.os(gcost.air, gcost.trn, gcost.bus) etc. That is, it is best to select one of the diagonal elements of the block matrix to represent that term.

niknots

numeric, the number of interior knots, called K below. The default is to use this value. If you want alg.niknots to operate then assign NULL to this argument.

alg.niknots

character. The algorithm used to determine the number of interior knots. Only used when all.knots = FALSE and niknots = NULL. Note that ".nknots.smspl" corresponds to the default of smooth.spline. The value "s" corresponds to the same algorithm as s.

all.knots

logical. If TRUE then all distinct points in x are used as the interior knots. If FALSE (default) then a subset of x[] is used, specifically x[j] where the niknots indices are quantiles that are evenly spaced with respect to the argument probs—see quantile. If all.knots = FALSE and niknots = NULL then the argument alg.niknots is used to compute niknots.

spar, maxspar

spar is a vector of smoothing parameters. Negative values mean that magic will choose initial values in order to do the optimization at each P-IRLS iteration. Positive values mean that they are used as initial values for magic. If fixspar = TRUE then spar should be assigned a vector of positive values (but having values less than maxspar); then the smoothing parameters will be fixed and magic will not be used.

o.order

The order of the O'Sullivan penalzed spline. Any one value from 1:4 is acceptable. The degree of the spline is 2 * o.order - 1, so that cubic splines are the default. Setting o.order = 1 results in a linear spline which is a piecewise linear function.

ridge.adj

small positive number to stabilize linear dependencies among B-spline bases.

spillover

small and positive proportion of the range used on the outside of the boundary values. This defines the endpoints a and b that cover the data x_i, i.e., we are interested in the interval [a,b] which contains all the abscissae. The interior knots are strictly inside (a,b).

outer.ok

Fed into the argument (by the same name) of splineDesign.

fixspar

logical. If TRUE then spar should be a vector with positive values and the smoothing parameters are fixed at those values. If FALSE then spar contains the initial values for the smoothing parameters, and magic is called to determine (hopefully) some good values for the smoothing parameters.

Details

This function is currently used by vgam to allow automatic smoothing parameter selection based on O-splines to minimize an UBRE quantity. In contrast, s operates by having a prespecified amount of smoothing, e.g., its df argument. When the sample size is reasonably large this function is recommended over s also because backfitting is not required. This function therefore allows 2nd-generation VGAMs to be fitted (called G2-VGAMs, or Penalized-VGAMs).

This function should only be used with vgam. This function uses quantile to choose the knots, whereas sm.ps chooses equally-spaced knots. As Wand and Ormerod (2008) write, in most situations the differences will be minor, but it is possible for problems to arise for either strategy by constructing certain regression functions and predictor variable distributions. Any differences between O-splines and P-splines tend to be at the boundaries. O-splines have natural boundary constraints so that the solution is linear beyond the boundary knots.

Some arguments in decreasing order of precedence are: all.knots, niknots, alg.niknots.

Unlike s, which is symbolic and does not perform any smoothing itself, this function does compute the penalized spline when used by vgam—it creates the appropriate columns of the model matrix. When this function is used within vgam, automatic smoothing parameter selection is implemented by calling magic after the necessary link-ups are done.

By default this function centres the component function. This function is also smart; it can be used for smart prediction (Section 18.6 of Yee (2015)). Automatic smoothing parameter selection is performed using performance-oriented iteration whereby an optimization problem is solved at each IRLS iteration.

This function works better when the sample size is large, e.g., when in the hundreds, say.

Value

A matrix with attributes that are (only) used by vgam. The number of rows of the matrix is length(x). The number of columns is a function of the number of interior knots K and the order of the O-spline m: K+2m-1. In code, this is niknots + 2 * o.order - 1, or using sm.ps-like arguments, ps.int + degree - 1 (where ps.int should be more generally interpreted as the number of intervals. The formula is the same as sm.ps.). It transpires then that sm.os and sm.ps are very similar.

Warning

Being introduced into VGAM for the first time, this function (and those associated with it) should be used cautiously. Not all options are fully working or have been tested yet, and there are bound to be some bugs lurking around.

Note

This function is currently under development and may change in the future.

One might try using this function with vglm so as to fit a regression spline, however, the default value of niknots will probably be too high for most data sets.

Author(s)

T. W. Yee, with some of the essential R code coming from the appendix of Wand and Ormerod (2008).

References

Wand, M. P. and Ormerod, J. T. (2008). On semiparametric regression with O'Sullivan penalized splines. Australian and New Zealand Journal of Statistics, 50(2): 179–198.

See Also

vgam, sm.ps, s, smartpred, is.smart, summarypvgam, smooth.spline, splineDesign, bs, magic.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
sm.os(runif(20))

## Not run: 
data("TravelMode", package = "AER")  # Need to install "AER" first
air.df <- subset(TravelMode, mode == "air")  # Form 4 smaller data frames
bus.df <- subset(TravelMode, mode == "bus")
trn.df <- subset(TravelMode, mode == "train")
car.df <- subset(TravelMode, mode == "car")
TravelMode2 <- data.frame(income     = air.df$income,
                          wait.air   = air.df$wait  - car.df$wait,
                          wait.trn   = trn.df$wait  - car.df$wait,
                          wait.bus   = bus.df$wait  - car.df$wait,
                          gcost.air  = air.df$gcost - car.df$gcost,
                          gcost.trn  = trn.df$gcost - car.df$gcost,
                          gcost.bus  = bus.df$gcost - car.df$gcost,
                          wait       = air.df$wait)  # Value is unimportant
TravelMode2$mode <- subset(TravelMode, choice == "yes")$mode  # The response
TravelMode2 <- transform(TravelMode2, incom.air = income, incom.trn = 0,
                                      incom.bus = 0)
set.seed(1)
TravelMode2 <- transform(TravelMode2,
                         junkx2 = runif(nrow(TravelMode2)))

tfit2 <-
  vgam(mode ~ sm.os(gcost.air, gcost.trn, gcost.bus) + ns(junkx2, 4) +
              sm.os(incom.air, incom.trn, incom.bus) + wait ,
       crit = "coef",
       multinomial(parallel = FALSE ~ 1), data = TravelMode2,
       xij = list(sm.os(gcost.air, gcost.trn, gcost.bus) ~
                  sm.os(gcost.air, gcost.trn, gcost.bus) +
                  sm.os(gcost.trn, gcost.bus, gcost.air) +
                  sm.os(gcost.bus, gcost.air, gcost.trn),
                  sm.os(incom.air, incom.trn, incom.bus) ~
                  sm.os(incom.air, incom.trn, incom.bus) +
                  sm.os(incom.trn, incom.bus, incom.air) +
                  sm.os(incom.bus, incom.air, incom.trn),
                  wait   ~  wait.air +  wait.trn +  wait.bus),
       form2 = ~  sm.os(gcost.air, gcost.trn, gcost.bus) +
                  sm.os(gcost.trn, gcost.bus, gcost.air) +
                  sm.os(gcost.bus, gcost.air, gcost.trn) +
                  wait +
                  sm.os(incom.air, incom.trn, incom.bus) +
                  sm.os(incom.trn, incom.bus, incom.air) +
                  sm.os(incom.bus, incom.air, incom.trn) +
                  junkx2 + ns(junkx2, 4) +
                  incom.air + incom.trn + incom.bus +
                  gcost.air + gcost.trn + gcost.bus +
                  wait.air +  wait.trn +  wait.bus)
par(mfrow = c(2, 2))
plot(tfit2, se = TRUE, lcol = "orange", scol = "blue", ylim = c(-4, 4))
summary(tfit2)

## End(Not run)

Example output

Loading required package: stats4
Loading required package: splines
             2           3           4            5           6           7
1  -0.01656762  0.43252517  0.32575319 -0.084077740 -0.15894771 -0.13643665
2  -0.04681092  0.06230370  0.55956536  0.068887409 -0.16418883 -0.14158821
3  -0.06502947 -0.13984066 -0.16413028 -0.069357542  0.42556946  0.09201237
4  -0.06145523 -0.13215453 -0.15420686  0.117554437  0.47413473 -0.11275119
5  -0.04748395 -0.10211041 -0.11984648 -0.090961492 -0.16699146  0.33667878
6  -0.01120457  0.44562478  0.30941139 -0.084391124 -0.15810856 -0.13571635
7   0.45370982  0.05248179 -0.14446276 -0.118941086 -0.21878839 -0.18780237
8   0.02149982 -0.27507193 -0.33026309 -0.250705338 -0.46116460 -0.39585191
9  -0.05665249 -0.12182664 -0.14298732 -0.108525005 -0.11483812  0.55532685
10 -0.04693858 -0.10082127  0.03123261  0.531977338  0.06288777 -0.14197436
11 -0.04076164 -0.08765463 -0.10287980 -0.078084062 -0.14363318  0.17187892
12 -0.06454345 -0.13879550 -0.16290358 -0.105539868  0.31155539  0.23877383
13 -0.04532700 -0.09463709  0.14421262  0.515711810 -0.02371182 -0.13709984
14 -0.06284616 -0.13514561 -0.15851980  0.061186514  0.49003604 -0.08325511
15 -0.06398530 -0.13759526 -0.16149487  0.008819871  0.48983550 -0.04023013
16 -0.01702033 -0.03660086 -0.04295824 -0.032604596 -0.05997513 -0.05146671
17 -0.03197186 -0.06875292 -0.08069496 -0.061246140 -0.11266035  0.04019202
18 -0.04915730 -0.10570883 -0.12406993 -0.094167021 -0.17088309  0.38693196
19  0.29863002  0.44531752  0.01550046 -0.073274107 -0.13478542 -0.11569636
20 -0.04608377  0.23846318  0.50374233 -0.052262257 -0.16534222 -0.14192554
             8            9          10
1  -0.08870885 -0.029978491 -0.03960344
2  -0.09205831 -0.031110414 -0.04109878
3  -0.12652352 -0.043218420 -0.05709421
4  -0.12085780 -0.040842986 -0.05395611
5   0.36922509  0.025203283 -0.04168968
6  -0.08824052 -0.029820223 -0.03939436
7  -0.12210599 -0.041264803 -0.05451336
8  -0.25737635 -0.086978408 -0.11490386
9   0.07524878 -0.035785557 -0.04973943
10 -0.09230938 -0.031195262 -0.04121087
11  0.45740853  0.138761369 -0.03437942
12 -0.11801903 -0.042895409 -0.05666749
13 -0.08914004 -0.030124209 -0.03979594
14 -0.12359319 -0.041767393 -0.05517731
15 -0.12583313 -0.042524469 -0.05617745
16 -0.03126974  0.080694724  0.89083336
17  0.41539215  0.333009265  0.00250706
18  0.32626277  0.006443274 -0.04315885
19 -0.07522386 -0.025421338 -0.03358316
20 -0.09227763 -0.031184534 -0.04119670
attr(,"S.arg")
            [,1]         [,2]         [,3]        [,4]         [,5]
 [1,]  2.6776722 -0.387915915  0.414024219  0.48846902  0.758750234
 [2,] -0.3879159  0.711285218 -0.415548192  0.03708059 -0.009573487
 [3,]  0.4140242 -0.415548192  0.712763637 -0.32361925  0.009379756
 [4,]  0.4884690  0.037080586 -0.323619252  0.98642898 -0.389528299
 [5,]  0.7587502 -0.009573487  0.009379756 -0.38952830  1.257808380
 [6,]  0.6736763 -0.006218384  0.141289848  0.27298549 -0.517584035
 [7,]  0.4353891 -0.009685752  0.048989266  0.15499163  0.130059314
 [8,]  0.1530538  0.009451552  0.031490565  0.05161064  0.081084646
 [9,]  0.1905902 -0.012466000  0.012314847  0.04595315  0.030540007
              [,6]         [,7]         [,8]        [,9]
 [1,]  0.673676253  0.435389150  0.153053846  0.19059024
 [2,] -0.006218384 -0.009685752  0.009451552 -0.01246600
 [3,]  0.141289848  0.048989266  0.031490565  0.01231485
 [4,]  0.272985490  0.154991628  0.051610637  0.04595315
 [5,] -0.517584035  0.130059314  0.081084646  0.03054001
 [6,]  0.959901484 -0.061277656  0.007606614  0.08426061
 [7,] -0.061277656  0.305242137 -0.111997770  0.07278408
 [8,]  0.007606614 -0.111997770  0.459337090 -0.25126673
 [9,]  0.084260608  0.072784081 -0.251266726  0.19106473
attr(,"knots")
                                             14.28571%  28.57143%  42.85714% 
0.04005607 0.04005607 0.04005607 0.04005607 0.20915288 0.28019348 0.42615974 
 57.14286%  71.42857%  85.71429%                                             
0.51113933 0.59790162 0.71421387 1.00607858 1.00607858 1.00607858 1.00607858 
attr(,"intKnots")
14.28571% 28.57143% 42.85714% 57.14286% 71.42857% 85.71429% 
0.2091529 0.2801935 0.4261597 0.5111393 0.5979016 0.7142139 
attr(,"spar")
[1] -1
attr(,"o.order")
[1] 2
attr(,"ps.int")
[1] NA
attr(,"all.knots")
[1] FALSE
attr(,"alg.niknots")
[1] "s"
attr(,"ridge.adj")
[1] 1e-05
attr(,"outer.ok")
[1] FALSE
attr(,"fixspar")
[1] FALSE

Call:
vgam(formula = mode ~ sm.os(gcost.air, gcost.trn, gcost.bus) + 
    ns(junkx2, 4) + sm.os(incom.air, incom.trn, incom.bus) + 
    wait, family = multinomial(parallel = FALSE ~ 1), data = TravelMode2, 
    form2 = ~sm.os(gcost.air, gcost.trn, gcost.bus) + sm.os(gcost.trn, 
        gcost.bus, gcost.air) + sm.os(gcost.bus, gcost.air, gcost.trn) + 
        wait + sm.os(incom.air, incom.trn, incom.bus) + sm.os(incom.trn, 
        incom.bus, incom.air) + sm.os(incom.bus, incom.air, incom.trn) + 
        junkx2 + ns(junkx2, 4) + incom.air + incom.trn + incom.bus + 
        gcost.air + gcost.trn + gcost.bus + wait.air + wait.trn + 
        wait.bus, crit = "coef", xij = list(sm.os(gcost.air, 
        gcost.trn, gcost.bus) ~ sm.os(gcost.air, gcost.trn, gcost.bus) + 
        sm.os(gcost.trn, gcost.bus, gcost.air) + sm.os(gcost.bus, 
        gcost.air, gcost.trn), sm.os(incom.air, incom.trn, incom.bus) ~ 
        sm.os(incom.air, incom.trn, incom.bus) + sm.os(incom.trn, 
            incom.bus, incom.air) + sm.os(incom.bus, incom.air, 
            incom.trn), wait ~ wait.air + wait.trn + wait.bus))


Pearson residuals:
                       Min       1Q    Median       3Q     Max
log(mu[,1]/mu[,4]) -1.2325 -0.55206 -0.072039  0.18172  9.0524
log(mu[,2]/mu[,4]) -1.8653 -0.36282 -0.102028  0.23232 21.3633
log(mu[,3]/mu[,4]) -1.6411 -0.35994 -0.078049 -0.01836  7.3034

Parametric coefficients: 
                Estimate Std. Error z value  Pr(>|z|)    
(Intercept):1   5.405167   1.050605  5.1448 2.678e-07 ***
(Intercept):2   3.212128   0.925386  3.4711 0.0005183 ***
(Intercept):3   2.760685   0.926564  2.9795 0.0028873 ** 
ns(junkx2, 4)1  0.249437   0.925644  0.2695 0.7875650    
ns(junkx2, 4)2 -0.355766   0.934698 -0.3806 0.7034840    
ns(junkx2, 4)3 -0.374398   2.021587 -0.1852 0.8530720    
ns(junkx2, 4)4  0.646703   0.877728  0.7368 0.4612492    
wait           -0.092753   0.011068 -8.3800 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                                          edf Est.rank  Chi.sq   p-value    
sm.os(gcost.air, gcost.trn, gcost.bus) 4.9025        9 43.0531 2.108e-06 ***
sm.os(incom.air, incom.trn, incom.bus) 5.0866        9  3.2333    0.9543    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Number of linear/additive predictors:    3 

Names of linear/additive predictors: log(mu[,1]/mu[,4]), log(mu[,2]/mu[,4]), log(mu[,3]/mu[,4]) 

Dispersion Parameter for multinomial family:   1

Residual deviance:  343.4858 on 640 degrees of freedom

Log-likelihood: -171.7429 on 640 degrees of freedom

Number of outer iterations:  7 

Number of IRLS iterations at final outer iteration:  2 

VGAM documentation built on Jan. 16, 2021, 5:21 p.m.