logLik: Log Likelihood of and between VLMC objects

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Compute the log-likelihood or “entropy” of a fitted vlmc object. This is a method for the generic logLik.

Usage

1
2
3
4
entropy(object)
## S3 method for class 'vlmc'
logLik(object, ...)
entropy2(ivlmc1, ivlmc2, alpha.len = ivlmc1[1])

Arguments

object

typically the result of vlmc(..).

ivlmc1,ivlmc2

two vlmc (sub) trees, see vlmc.

alpha.len

positive integer specifying the alphabet length.

...

(potentially more arguments; required by generic)

Details

The logLik.vlmc() method computes the log likelihood for a fitted vlmc object. entropy is an alias for logLik for reasons of back compatibility.

entropy2 is less clear ... ... [[[ FIXME ]]] ... ...

Value

a negative number, in some contexts typically further divided by log(x$alpha.len).

Note that the logLik method is used by the default method of the AIC generic function (from R version 1.4.x), and hence provides AIC(object) for vlmc objects. Also, since vlmc version 1.3-13, BIC() works as well.

Author(s)

Martin Maechler

See Also

deviance.vlmc, vlmc, draw.vlmc.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
dd <- cumsum(rpois(999, 1.5)) %% 10
(vd <- vlmc(dd))
entropy(vd)# the bare number
logLik(vd)
logLik(vdL <- vlmc(dd, cutoff = 3))
entropy2(vd $vlmc.vec,
         vdL$vlmc.vec)

## AIC model selection:
f1 <- c(1,0,0,0)  # as in example(vlmc)
f2 <- rep(1:0,2)
(dt1 <- c(f1,f1,f2,f1,f2,f2,f1))
AIC(print(vlmc(dt1)))
AIC(print(vlmc(dt1, cutoff = 2.6)))
AIC(print(vlmc(dt1, cutoff = 0.4)))# these two differ ``not really''
AIC(print(vlmc(dt1, cutoff = 0.1)))

## Show how to compute it from the fitted conditional probabilities :
logLikR <- function(x) {
    dn <- dimnames(pr <- predict(x))
    sum(log(pr[cbind(2:nrow(pr), match(dn[[1]][-1], dn[[2]]))]))
}

all.equal(  logLikR(vd),
          c(logLik (vd)), tol=1e-10) # TRUE, they do the same

## Compare different ones:  [cheap example]:
example(draw)
for(n in ls())
  if(is.vlmc(get(n))) {
       vv <- get(n)
       cat(n,":",formatC(logLik(vv) / log(vv$alpha.len),
                         format= "f", wid=10),"\n")
  }

Example output

'vlmc' a Variable Length Markov Chain;
	 alphabet '0123456789', |alphabet| = 10, n = 999.
Call: vlmc(dts = dd);  default cutoff =8.459
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        1        10        10       222 
AIC =  3174 
[1] -1497.225
'log Lik.' -1497.225 (df=90)
'log Lik.' -1286.197 (df=576)
[1] -641.7994
 [1] 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1);  default cutoff =1.921
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        3         4         3        26 
AIC =  21.46 
[1] 21.46023

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1, cutoff.prune = 2.6);  cutoff =2.6
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        1         2         1        10 
AIC =  27.51 
[1] 27.50815

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1, cutoff.prune = 0.4);  cutoff =0.4
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        7        11         5        62 
AIC =  27.55 
[1] 27.54518

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1, cutoff.prune = 0.1);  cutoff =0.1
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        7        11         6        66 
AIC =  27.55 
[1] 27.54518
[1] TRUE

draw>   example(vlmc)

vlmc> f1 <- c(1,0,0,0)

vlmc> f2 <- rep(1:0,2)

vlmc> (dt1 <- c(f1,f1,f2,f1,f2,f2,f1))
 [1] 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0

vlmc> (vlmc.dt1  <- vlmc(dt1))

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1);  default cutoff =1.921
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        3         4         3        26 
AIC =  21.46 

vlmc>  vlmc(dt1, dump = 1,
vlmc+       ctl.dump = c(wid = 3, nmax = 20), debug = TRUE)
vlmc: Alpha = '01' ; |X| = 2
vlmc: ctl.dump =  3 20 
vlmc: n = |data| = 28, cutoff{prune} = 1.92073, threshold{gen} = 2
vlmc: |alphabet| = 2, alphabet = 01
generating... pruning... Dump{Tree} __after__ pruning: 
Lev Ch|   0   1 |  tot |  num size : ..{set->list}..
------+--------------------------------------------------
  0 x |  18  10 |   28 |   28   32 : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ..
  1 0 |   8   9 |   17 |   17   32 : 2 3 4 6 7 8 10 12 14 15 16 18 20 22 24 26 27
  2 0 |   4   3 |    7 |    7   16 : 3 4 7 8 15 16 27
  3 0 |   0   3 |    3 |    3   16 : 4 8 16
  3 1 |   4   0 |    4 |    4   16 : 3 7 15 27
  1 1 |  10   0 |   10 |   10   16 : 1 5 9 11 13 17 19 21 23 25
computing differences['completing'] ... writing tree...
[0]01
0 0 0
[1]1 4 6
[2]2 0 0
[3]3 0 3
- -1
- -1

[3]3 4 0
- -1
- -1


- -1

[1]1 10 0
- -1
- -1



'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1, debug = TRUE, dump = 1, ctl.dump = c(wid = 3,     nmax = 20));  default cutoff =1.921
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        3         4         3        26 
AIC =  21.46 

vlmc> (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1))
Lev Ch|  0  1 | tot | num size : ..{set->list}..
------+---------------------------------------------
  0 x | 18 10 |  28 |  28  32 : 0 1 2 3 4 5 6 7 8 9 10 11 12 ..
  1 0 |  8  9 |  17 |  17  32 : 2 3 4 6 7 8 10 12 14 15 16 18 20 ..
  2 0 |  4  3 |   7 |   7  16 : 3 4 7 8 15 16 27
  3 0 |  0  3 |   3 |   3  16 : 4 8 16
  3 1 |  4  0 |   4 |   4  16 : 3 7 15 27
  2 1 |  4  6 |  10 |  10  16 : 2 6 10 12 14 18 20 22 24 26
  3 0 |  3  6 |   9 |   9  16 : 6 10 12 14 18 20 22 24 26
  4 0 |  1  2 |   3 |   3  16 : 6 10 18
  5 0 |  1  2 |   3 |   3  16 : 6 10 18
  6 1 |  1  2 |   3 |   3  16 : 6 10 18
  7 0 |  0  2 |   2 |   2  16 : 10 18
  4 1 |  2  4 |   6 |   6  16 : 12 14 20 22 24 26
  5 0 |  2  4 |   6 |   6  16 : 12 14 20 22 24 26
  6 0 |  0  2 |   2 |   2  16 : 12 20
  6 1 |  2  2 |   4 |   4  16 : 14 22 24 26
  1 1 | 10  0 |  10 |  10  16 : 1 5 9 11 13 17 19 21 23 25

'vlmc' a Variable Length Markov Chain;
	 alphabet '01', |alphabet| = 2, n = 28.
Call: vlmc(dts = dt1, cutoff.prune = 0.1, dump = 1);  cutoff =0.1
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        7        11         6        66 
AIC =  27.55 

vlmc> data(presidents)

vlmc> dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA

vlmc> table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level

  (0,45]  (45,70] (70,100]     <NA> 
      28       60       26        6 

vlmc> levels(dpres)#-> make the alphabet -> warning
[1] "(0,45]"   "(45,70]"  "(70,100]" NA        

vlmc> vlmc.pres <- vlmc(dpres, debug = TRUE)
vlmc: Alpha = '0123' ; |X| = 4
vlmc: ctl.dump =  3.079181 -1 
vlmc: n = |data| = 120, cutoff{prune} = 3.90736, threshold{gen} = 2
vlmc: |alphabet| = 4, alphabet = 0123
generating... pruning... computing differences['completing'] ... writing tree...
[0]0123
0 1 2 1 2
[1]1 18 7 0 2
- -1
- -1
- -1
- -1

[1]1 9 33 5 1
- -1
- -1
[2]2 0 6 6 0
- -1
- -1
- -1
- -1

- -1

[1]1 0 12 14 0
- -1
- -1
- -1
- -1

- -1


vlmc> vlmc.pres

'vlmc' a Variable Length Markov Chain;
	 alphabet '0123', |alphabet| = 4, n = 120.
Call: vlmc(dts = dpres, debug = TRUE);  default cutoff =3.907
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        2         5         3        42 
AIC =  227.6 

vlmc> ## alphabet & and its length:
vlmc> vlmc.pres$alpha
[1] "0123"

vlmc> stopifnot(
vlmc+   length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len
vlmc+ )
[1] "0" "1" "2" "3"

vlmc> ## You now can use larger alphabets (up to 95) letters:
vlmc> set.seed(7); it <- sample(40, 20000, replace=TRUE)

vlmc> v40 <- vlmc(it)

vlmc> v40

'vlmc' a Variable Length Markov Chain;
	 alphabet 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN', |alphabet| = 40, n = 20000.
Call: vlmc(dts = it);  default cutoff =27.29
 -> extensions (= $size ) :
   ord.MC   context nr.leaves     total 
        0         1         1        82 
AIC =  147590 

vlmc> ## even larger alphabets now give an error:
vlmc> il <- sample(100, 10000, replace=TRUE)

vlmc> ee <- tryCatch(vlmc(il), error= function(e)e)

vlmc> stopifnot(is(ee, "error"))

draw>   draw(vlmc.dt1c01)
[x]-( 18 9| 27)-F
 +--[0]-( 8 9| 17) <1.38>-F
 |   +--[0]-( 4 3| 7) <0.14>-F
 |   |   +--[0]-( 0 3| 3) <2.54>-T
 |   |   '--[1]-( 4 0| 4) <2.24>-T
 |   '--[1]-( 4 6| 10) <0.10>
 |       '--[0]-( 3 6| 9) <0.09>-F
 |           +--[0]-( 1 2| 3) <0.00>
 |           |   '--[0]-( 1 2| 3) <0.00>
 |           |       '--[1]-( 1 2| 3) <0.00>
 |           |           '--[0]-( 0 2| 2) <0.81>-T
 |           '--[1]-( 2 4| 6) <0.00>
 |               '--[0]-( 2 4| 6) <0.00>-F
 |                   +--[0]-( 0 2| 2) <0.81>-T
 |                   '--[1]-( 2 2| 4) <0.24>-T
 '--[1]-( 10 0| 10) <4.05>-T

draw>   draw(vlmc.dt1c01, flag = FALSE)
-4.000000

draw>   draw(vlmc.dt1c01, kind = 1)
[x]-( 18 9| 27)
 +--[0]-( 8 9| 17) <1.38>
 +---+--[0]-( 4 3| 7) <0.14>
 +---+---+--[0]-( 0 3| 3) <2.54>
 +---+---+--[1]-( 4 0| 4) <2.24>
 +---+--[1]-( 4 6| 10) <0.10>
 +---+---+--[0]-( 3 6| 9) <0.09>
 +---+---+---+--[0]-( 1 2| 3) <0.00>
 +---+---+---+---+--[0]-( 1 2| 3) <0.00>
 +---+---+---+---+---+--[1]-( 1 2| 3) <0.00>
 +---+---+---+---+---+---+--[0]-( 0 2| 2) <0.81>
 +---+---+---+--[1]-( 2 4| 6) <0.00>
 +---+---+---+---+--[0]-( 2 4| 6) <0.00>
 +---+---+---+---+---+--[0]-( 0 2| 2) <0.81>
 +---+---+---+---+---+--[1]-( 2 2| 4) <0.24>
 +--[1]-( 10 0| 10) <4.05>

draw>   draw(vlmc.dt1)
[x]-( 18 9| 27)-F
 +--[0]-( 8 9| 17) <1.38>
 |   '--[0]-( 4 3| 7) <0.14>-F
 |       +--[0]-( 0 3| 3) <2.54>-T
 |       '--[1]-( 4 0| 4) <2.24>-T
 '--[1]-( 10 0| 10) <4.05>-T

draw>   draw(vlmc.dt1, show = 3)
[x]-( 18 9| 27)-F
 +--[0]-( 8 9| 17) <1.38>-.
 |   +--[0]-( 4 3| 7) <0.14>-F
 |   |   +--[0]-( 0 3| 3) <2.54>-T
 |   |   '--[1]-( 4 0| 4) <2.24>-T
 |   '--[1]-(___)
 '--[1]-( 10 0| 10) <4.05>-T

draw>   draw(vlmc.dt1, cumulative = FALSE)
[x]-( 0 0| 0)-F
 +--[0]-( 4 6| 10)
 |   '--[0]-( 0 0| 0)-F
 |       +--[0]-( 0 3| 3)-T
 |       '--[1]-( 4 0| 4)-T
 '--[1]-( 10 0| 10)-T
Warning messages:
1: In vlmc(dpres, debug = TRUE) :
  alphabet with >1-letter strings; trying to abbreviate
2: In vlmc(it) : alphabet with >1-letter strings; trying to abbreviate
v40 : -19994.0949 
vd :  -650.2367 
vdL :  -558.5883 
vlmc.dt1 :    -9.7095 
vlmc.dt1c01 :    -4.0000 
vlmc.pres :   -71.2724 

VLMC documentation built on May 1, 2019, 11:32 p.m.

Related to logLik in VLMC...