ocat: GAM ordered categorical family

Description Usage Arguments Details Value Author(s) References Examples

View source: R/efam.r

Description

Family for use with gam or bam, implementing regression for ordered categorical data. A linear predictor provides the expected value of a latent variable following a logistic distribution. The probability of this latent variable lying between certain cut-points provides the probability of the ordered categorical variable being of the corresponding category. The cut-points are estimated along side the model smoothing parameters (using the same criterion). The observed categories are coded 1, 2, 3, ... up to the number of categories.

Usage

1
ocat(theta=NULL,link="identity",R=NULL)

Arguments

theta

cut point parameter vector (dimension R-2). If supplied and all positive, then taken to be the cut point increments (first cut point is fixed at -1). If any are negative then absolute values are taken as starting values for cutpoint increments.

link

The link function: only "identity" allowed at present (possibly for ever).

R

the number of catergories.

Details

Such cumulative threshold models are only identifiable up to an intercept, or one of the cut points. Rather than remove the intercept, ocat simply sets the first cut point to -1. Use predict.gam with type="response" to get the predicted probabilities in each category.

Value

An object of class extended.family.

Author(s)

Simon N. Wood [email protected]

References

Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 http://dx.doi.org/10.1080/01621459.2016.1180986

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(mgcv)
## Simulate some ordered categorical data...
set.seed(3);n<-400
dat <- gamSim(1,n=n)
dat$f <- dat$f - mean(dat$f)

alpha <- c(-Inf,-1,0,5,Inf)
R <- length(alpha)-1
y <- dat$f
u <- runif(n)
u <- dat$f + log(u/(1-u)) 
for (i in 1:R) {
  y[u > alpha[i]&u <= alpha[i+1]] <- i
}
dat$y <- y

## plot the data...
par(mfrow=c(2,2))
with(dat,plot(x0,y));with(dat,plot(x1,y))
with(dat,plot(x2,y));with(dat,plot(x3,y))

## fit ocat model to data...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=ocat(R=R),data=dat)
b
plot(b,pages=1)
gam.check(b)
summary(b)
b$family$getTheta(TRUE) ## the estimated cut points

## predict probabilities of being in each category
predict(b,dat[1:2,],type="response",se=TRUE)

Example output

Loading required package: nlme
This is mgcv 1.8-20. For overview type 'help("mgcv-package")'.
Gu & Wahba 4 term additive model

Family: Ordered Categorical(-1,0.07,5.15) 
Link function: identity 

Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)

Estimated degrees of freedom:
3.32 3.11 7.81 1.59  total = 16.84 

REML score: 283.8245     

Method: REML   Optimizer: outer newton
full convergence after 5 iterations.
Gradient range [-4.800542e-06,4.973802e-08]
(score 283.8245 & scale 1).
Hessian positive definite, eigenvalue range [0.09977744,150.0804].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

        k'  edf k-index p-value  
s(x0) 9.00 3.32    0.94   0.095 .
s(x1) 9.00 3.11    1.06   0.920  
s(x2) 9.00 7.81    0.95   0.130  
s(x3) 9.00 1.59    0.97   0.280  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Family: Ordered Categorical(-1,0.07,5.15) 
Link function: identity 

Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.1221     0.1319   0.926    0.354

Approximate significance of smooth terms:
        edf Ref.df  Chi.sq  p-value    
s(x0) 3.317  4.116  21.623 0.000263 ***
s(x1) 3.115  3.871 188.368  < 2e-16 ***
s(x2) 7.814  8.616 402.300  < 2e-16 ***
s(x3) 1.593  1.970   0.936 0.640434    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Deviance explained = 57.7%
-REML = 283.82  Scale est. = 1         n = 400
[1] -1.00000000  0.07295739  5.14663505
$fit
        [,1]        [,2]        [,3]         [,4]
1 0.99085777 0.005996704 0.003125774 0.0000197507
2 0.06793525 0.107745442 0.795787468 0.0285318416

$se.fit
         [,1]       [,2]        [,3]         [,4]
1 0.006829264 0.00446533 0.002349044 1.488965e-05
2 0.028948637 0.03725874 0.053535372 1.267200e-02

mgcv documentation built on Oct. 26, 2018, 5:05 p.m.