predict.vlmc: Prediction of VLMC for (new) Series
In VLMC: Variable Length Markov Chains ('VLMC') Models

predict.vlmc

R Documentation

Prediction of VLMC for (new) Series

Description

Compute predictions on a fitted VLMC object for each (but the first) element of another discrete time series. Computes by default a matrix of prediction probabilities. The argument type allows other predictions such as the most probable "class" or "response", the context length (tree "depth"), or an "ID" of the corresponding context.

Usage

## S3 method for class 'vlmc'
predict(object, newdata,
         type = c("probs", "class","response", "id.node", "depth", "ALL"),
         se.fit = FALSE,
         allow.subset = TRUE, check.alphabet=TRUE,
         ...)
## S3 method for class 'vlmc'
fitted(object, ...)

Arguments

`object`	typically the result of `vlmc(..)`.
`newdata`	a discrete “time series”, a numeric, character or factor, as the `dts` argument of `vlmc(.)`.
`type`	character indicating the type of prediction required, options given in the Usage secion above, see also the Value section below. The default `"probs"` returns a matrix of prediction probabilties, whereas `"class"` or `"response"` give the corresponding most probable class. The value of this argument can be abbreviated.
`se.fit`	a switch indicating if standard errors are required. — NOT YET supported — .
`allow.subset`	logical; if `TRUE`, `newdata` may not have all different “alphabet letters” used in `x`.
`check.alphabet`	logical; if `TRUE`, consistency of `newdata`'s alphabet with those of `x` is checked.
`...`	(potentially further arguments) required by generic.

Value

Depending on the type argument,

`"probs"`	an `n \times m` matrix `pm` of (prediction) probabilities, i.e., all the rows of `pm` sum to 1. `pm[i,k]` is `\hat P[Y_i = k \| Y_{i-1},\dots]` (and is therefore `NA` for `i=1`). The `dimnames` of `pm` are the values of `newdata[]` and the alphabet letters `k`.
`"class"`, `"response"`	the corresponding most probable value of Y[]; as `factor` for `"class"` and as integer in `0:(m-1)` for `type = "response"`. If there is more than one most probable value, the first one is chosen.
`"id.node"`	an (integer) “ID” of the current context (= node of the tree represented VLMC).
`"depth"`	the context length, i.e., the depth of the Markov chain, at the current observation (of `newdata`).
`"ALL"`	an object of class `"predict.vlmc"`, a list with the following components, ID integer vector as for `type = "id.node"`, probs prediction probability matrix, as above, flags integer vector, non-zero for particular states only, rather for debugging. ctxt character, `ctxt[i]` a string giving the context (backwards) for `newdata[i]`, using alphabet letters. fitted character with fitted values, i.e., the alphabet letter with the highest probability, using `max.col` where ties are broken at random. alpha, alpha.len the alphabet (single string) and its length. which has its own print method (`print.predict.vlmc`).

Note

The predict method and its possible arguments may still be developed, and we are considering to return the marginal probabilities instead of NA for the first value(s).

The print method print.predict.vlmc uses fractions from package MASS to display the probabilities Pr[X = j], for j \in \{0,1,\dots\}, as these are rational numbers, shown as fractions of integers.

Examples

f1 <- c(1,0,0,0)
f2 <- rep(1:0,2)
(dt2 <- rep(c(f1,f1,f2,f1,f2,f2,f1),2))

(vlmc.dt2c15  <- vlmc(dt2, cutoff = 1.5))
draw(vlmc.dt2c15)

## Fitted Values:
all.equal(predict(vlmc.dt2c15, dt2), predict(vlmc.dt2c15))
(pa2c15 <- predict(vlmc.dt2c15, type = "ALL"))

## Depth = context length  ([1] : NA) :
stopifnot(nchar(pa2c15 $ ctxt)[-1] ==
          predict(vlmc.dt2c15, type = "depth")[-1])

same <- (ff1 <- pa2c15 $ fitted) ==
        (ff2 <- int2alpha(predict(vlmc.dt2c15, type ="response"), alpha="01"))
which(!same) #-> some are different, since max.col() breaks ties at random!

ndt2 <- c(rep(0,6),f1,f1,f2)
predict(vlmc.dt2c15, ndt2, "ALL")

(newdt2 <- sample(dt2, 17))
pm <- predict(vlmc.dt2c15, newdt2, allow.subset = TRUE)
summary(apply(pm, 1, sum))# all 1

predict(vlmc.dt2c15, newdt2, type = "ALL")

data(bnrf1)
(vbnrf <- vlmc(bnrf1EB))
(pA <- predict(vbnrf, bnrf1EB[1:24], type = "ALL"))
 pc <- predict(vbnrf, bnrf1EB[1:24], type = "class")
 pr <- predict(vbnrf, bnrf1EB[1:24], type = "resp")
stopifnot(as.integer  (pc[-1])   == 1 + pr[-1],
          as.character(pc[-1]) == strsplit(vbnrf$alpha,NULL)[[1]][1 + pr[-1]])

##-- Example of a "perfect" fit -- just for illustration:
##			    the default, thresh = 2 doesn't fit perfectly(i=38)
(vlmc.dt2c0th1 <- vlmc(dt2, cutoff = 0, thresh = 1))

## "Fitted" = "Data" (but the first which can't be predicted):
stopifnot(dt2[-1] == predict(vlmc.dt2c0th1,type = "response")[-1])

VLMC documentation built on Sept. 11, 2024, 5:28 p.m.