vlmc: Fit a Variable Length Markov Chain (VLMC)

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

Fit a Variable Length Markov Chain (VLMC) to a discrete time series, in basically two steps:
First a large Markov Chain is generated containing (all if threshold.gen = 1) the context states of the time series. In the second step, many states of the MC are collapsed by pruning the corresponding context tree.

Currently, the “alphabet” may contain can at most 26 different “character”s.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
vlmc(dts,
     cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2,
     alpha.c = 0.05,
     threshold.gen = 2,
     code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE,
     dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) )

is.vlmc(x)
## S3 method for class 'vlmc'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

dts

a discrete “time series”; can be a numeric, character or factor.

cutoff.prune

non-negative number; the cutoff used for pruning; defaults to half the α-quantile of a chisq distribution, where α = alpha.c, the following argument:

alpha.c

number in (0,1) used to specify cutoff.prune in the more intuitive χ^2 quantile scale; defaulting to 5%.

threshold.gen

integer >= 1 (usually left at 2). When generating the initial large tree, only generate nodes with count >= threshold.gen.

code1char

logical; if true (default), the data dts will be ..........FIXME...........

y

logical; if true (default), the data dts will be returned. This allows to ensure that residuals (residuals.vlmc) and “k-step ahead” predictions can be computed from the result.

debug

logical; should debugging info be printed to stderr.

quiet

logical; if true, don't print some warnings.

dump

integer in 0:2. If positive, the pruned tree is dumped to stderr; if 2, the initial unpruned tree is dumped as well.

ctl.dump

integer of length 2, say ctl[1:2] controlling the above dump when dump > 0. ctl[1] is the width (number of characters) for the “counts”, ctl[2] the maximal number of set elements that are printed per node; when the latter is not positive (by default), currently max(6, 15 - log10(n)) is used.

x

a fitted "vlmc" object.

digits

integer giving the number of significant digits for printing numbers.

...

potentially further arguments [Generic].

Value

A "vlmc" object, basically a list with components

nobs

length of data series when fit. (was named "n" in earlier versions.)

threshold.gen, cutoff.prune

the arguments (or their defaults).

alpha.len

the alphabet size.

alpha

the alphabet used, as one string.

size

a named integer vector of length (>=) 4, giving characteristic sizes of the fitted VLMC. Its named components are

"ord.MC"

the (maximal) order of the Markov chain,

"context"

the “context tree size”, i.e., the number of leaves plus number of “hidden nodes”,

"nr.leaves"

is the number of leaves, and

"total"

the number of integers needed to encode the VLMC tree, i.e., length(vlmc.vec) (see below).

vlmc.vec

integer vector, containing (an encoding of) the fitted VLMC tree.

y

if y = TRUE, the data dts, as character, using the letters from alpha.

call

the call vlmc(..) used.

Note

Set cutoff = 0, thresh = 1 for getting a “perfect fit”, i.e. a VLMC which perfectly re-predicts the data (apart from the first observation). Note that even with cutoff = 0 some pruning may happen, for all (terminal) nodes with delta=0.

Author(s)

Martin Maechler

References

Buhlmann P. and Wyner A. (1998) Variable Length Markov Chains. Annals of Statistics 27, 480–513.

Mächler M. and Bühlmann P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. J. Computational and Graphical Statistics 2, 435–455.

Mächler M. (2004) VLMC — Implementation and R interface; working paper.

See Also

draw.vlmc, entropy, simulate.vlmc for “VLMC bootstrapping”.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
f1 <- c(1,0,0,0)
f2 <- rep(1:0,2)
(dt1 <- c(f1,f1,f2,f1,f2,f2,f1))

(vlmc.dt1  <- vlmc(dt1))
 vlmc(dt1, dump = 1,
      ctl.dump = c(wid = 3, nmax = 20), debug = TRUE)
(vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1))

data(presidents)
dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA
table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level
levels(dpres)#-> make the alphabet -> warning
vlmc.pres <- vlmc(dpres, debug = TRUE)
vlmc.pres

## alphabet & and its length:
vlmc.pres$alpha
stopifnot(
  length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len
)

## You now can use larger alphabets (up to 95) letters:
set.seed(7); it <- sample(40, 20000, replace=TRUE)
v40 <- vlmc(it)
v40
## even larger alphabets now give an error:
il <- sample(100, 10000, replace=TRUE)
ee <- tryCatch(vlmc(il), error= function(e)e)
stopifnot(is(ee, "error"))

VLMC documentation built on May 1, 2019, 11:32 p.m.

Related to vlmc in VLMC...