transformDataToNjki: Transform Markov Chain (Time Series) Data Into Transition...
In bayesMCClust: Mixtures-of-Experts Markov Chain Clustering and Dirichlet Multinomial Clustering

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Transform time series (Markov chain) data with several states/categories into the required Njk.i-structure containing the transition frequencies between these states/categories.

The functions dataFrameToNjki and dataListToNjki transform time series data representing Markov chains with several states/categories in a format ready for use in mcClustering and dmClustering and their versions without extension.

The resulting data format is a 3-dim array which contains the absolute transition frequencies stored in a matrix for each individual (see section Value).

With dataFrameToNjki a data.frame or matrix where the rows contain the time series (implying equal lengths T) can be transformed.

Note that by using a special (different) 'number' (end-of-line) to indicate the (earlier) end (and/or remainder) of a time series (and with which the vector may be filled afterwards), it is also possible to use this procedure when later deleting the corresponding row and column in the transition frequency matrices.

With dataListToNjki a list of vectors representing the time series (which may have individual lengths T_i) can be transformed.

1 2	dataFrameToNjki(dataFrame) dataListToNjki(dataList)

dataFrame

data.frame or matrix of dimension N x T where the i-th row contains the time series of the i-th individual. N is the number of individuals/units/objects and T is the number of columns not necessarily equal to the length of the time series. The time series itself may be of different lengths and the end and/or remainder of the rows are indicated or filled up with a different (special) number (end-of-line; e.g. zero). In such a case it is necessary to delete the corresponding row and column in the resulting transition frequency matrices.

dataList

A list of N vectors where the i-th entry corresponds to the time series (with possibly individual length T_i) of the i-th individual.

Note that for a single individual the number of transitions is always equal to one minus length of time series; that is T-1 or T_i-1, respectively.

The categories/states of the Markov chain and optionally the end-of-line number should have consecutive numbering. By default, either functions DO NOT transform the (original) indexing of the categories/states into 0,...,K (e.g. if the original numbering started with 1). The ORIGINAL numbering IS used for the indexing of the (resulting) transition matrices. Note that the number of different categories here is K+1 (see remark in Note).

In other words, the (consecutive) numbering of the categories is NOT transformed into 0,...,K. If an end-of-line or end-of-time-series symbol/number appears (in dataFrame) the corresponding rows/columns in the returned 3-dim array (see Value) can be deleted afterwards.

A three-dimensional array of format (K + 1) x (K + 1) x N where each i-th matrix represents the transition frequencies of individual i. (K+1) is equal to the number of different categories/states.

Note, that in contrast to the literature (see References), the numbering (labelling) of the states of the categorical outcome variable (time series) in this package is sometimes 0,...,K (instead of 1,...,K), however, there are K+1 categories (states)!

Christoph Pamminger <christoph.pamminger@gmail.com>

Sylvia Fruehwirth-Schnatter, Christoph Pamminger, Andrea Weber and Rudolf Winter-Ebmer, (2011), "Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering". Journal of Applied Econometrics. DOI: 10.1002/jae.1249 http://onlinelibrary.wiley.com/doi/10.1002/jae.1249/abstract

Christoph Pamminger and Sylvia Fruehwirth-Schnatter, (2010), "Model-based Clustering of Categorical Time Series". Bayesian Analysis, Vol. 5, No. 2, pp. 345-368. DOI: 10.1214/10-BA606 http://ba.stat.cmu.edu/journal/2010/vol05/issue02/pamminger.pdf

mcClust, dmClust, mcClustExtended, dmClustExtended

# rm(list=ls(all=TRUE))

# set working directory
getwd()
if ( !file.exists("bayesMCClust-wd") ) dir.create("bayesMCClust-wd")
setwd("bayesMCClust-wd") 

# define data 
data(MCCExampleData)

myObsList <- MCCExampleData$obsList
class(myObsList)
length(myObsList)
myObsList[1:5]  # no end-of-line here!
table( unlist(myObsList) ) # categories consecutively numbered?

njki <- dataListToNjki(myObsList) # generate array for N transition matrices
dim(njki)
njki[,,1:5]  # for verification
apply(njki, c(1, 2), sum) # sum up all transitions of all individuals

tsLength <- sapply(myObsList, length) # calculate time series lengths
table(tsLength) # at least 2? -- corresponds to at least 1 transition

Njk.i <- njki # store Njk.i
# save( Njk.i, file = "Njk_i.RData" )      # save Njk.i in "Njk_i.RData"