dmClustering: Dirichlet Multinomial Clustering With And Without...

Description Usage Arguments Details Value Prior Data Log File Warning Note Author(s) References See Also Examples

Description

This function provides Dirichlet Multinomial Clustering with or without multinomial logit model (mixtures-of-experts) extension (see References). That is an MCMC sampler for the mixtures-of-experts extension of Dirichlet Multinomial clustering. It requires four mandatory arguments: Data, Prior, Initial and Mcmc; each representing a list of (mandatory) arguments: Data contains data information, Prior contains prior information, Initial contains information about starting conditions (initial values) and Mcmc contains the setup for the MCMC sampler.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
dmClust( 
    Data = list( 
        dataFile = 
            stop("'dataFile' must be specified: filename or data"), 
        storeDir = "try01", mccFile = "mcc.RData"), 
    Prior = list( H = 4, alpha0 = 4, a0 = 1, alpha = 1, N0 = 10, 
        isPriorNegBin = FALSE, mccAsPrior = FALSE, 
        xiPooled = TRUE, persPrior = 7/10), 
    Initial = list( mccUse = FALSE, pers = 1/6, S.i.start = NULL), 
    Mcmc = list( kNo = 2, M = 50, M0 = 20, mOut = 5, mSave = 10, 
        showAcc = TRUE, monitor = FALSE, seed = 12345))
                      
dmClustExtended( 
    Data = list( 
        dataFile = 
            stop("'dataFile' must be specified: filename or data"), 
        storeDir = "try01", mccFile = "mcc.RData", 
        X = stop("X (matrix of covariates) must be specified")), 
    Prior = list( H = 4, a0 = 1, alpha = 1, N0 = 10, 
        isPriorNegBin = FALSE, mccAsPrior = FALSE, 
        xiPooled = TRUE, persPrior = 7/10, 
        betaPrior = "informative", betaPriorMean = 0, 
        betaPriorVar = 1), 
    Initial = list( mccUse = FALSE, pers = 1/6, 
        S.i.start = rep(1:H, N), Beta.start = NULL), 
    Mcmc = list( kNo = 2, M = 50, M0 = 20, mOut = 5, mSave = 10, 
        showAcc = TRUE, monitor = FALSE, seed = 12345))

Arguments

Data

a list consisting of: dataFile, storeDir, mccFile, X. See Details.

Prior

a list consisting of: H, alpha0, a0, alpha, N0, isPriorNegBin, mccAsPrior, xiPooled, persPrior, betaPrior, betaPriorMean, betaPriorVar. See Details.

Initial

a list consisting of: mccUse, pers, S.i.start, Beta.start. See Details.

Mcmc

a list consisting of: kNo, M, M0, mOut, mSave, showAcc, monitor, seed. See Details.

Details

Note that the values of the arguments indicated here have nothing to do with default values! For a call of these functions this lists-of-arguments structure requires a complete specification of all arguments!

The following arguments which are lists have to be completely provided (note that there are no such things as default values within lists!):

Data contains:

dataFile

A 3-dim array having the transition counts/frequencies structure (like Njk.i in the example data sets) already loaded into the current environment/workspace. Or a character with the name of or the path to an .RData-file which contains such a data set, in which case it must have the name “Njk.i”.

It is required that this data have to be a 3-dimensional array of dimension (K+1) x (K+1) x N containing the transition counts/frequencies, where K+1 is the number of categories k=0,…,K and N the number of objects/units/individuals. The number of transitions (equal to time series length minus one) may be individual.

storeDir

A character indicating the name of the directory (will be created if not already existing) where the log file and the results are to be stored.

mccFile

If not NULL the prior data (must have same format as mccXiPrior in LMEntryPaperData – at least the H-th entry in the list has to be provided) or a character with the name of or the path to a file containing such data, which in this case must be named “mcc”. The prior data contain prior information (in terms of probabilities) about transition probabilities (possibly from another estimation procedure). For further information see Section Prior Data and mccXiPrior in LMEntryPaperData.

X

The matrix of covariates (with N rows) including the unit vector for the intercept to be included in the multinomial logit model extension.

Prior contains (see also Section Prior Data):

H

An integer >= 1 indicating the number of clusters/groups.

alpha0

A numerical value determining the value of the prior parameter of the Dirichlet-prior for the group sizes η_h (alpha0 =α_1=…=α_H, thus equal for all h).

a0

A numerical value determining a parameter of the negative multinomial prior (see references for more details).

alpha

A numerical value determining a parameter of the negative multinomial prior (see references for more details).

N0

A numerical value determining a parameter of the negative multinomial prior (see references for more details).

isPriorNegBin

If TRUE, the product of negative binomial distributions is used instead of the negative multinomial distribution (see references for more details).

mccAsPrior

If mccAsPrior=TRUE, prior information for the transition probabilities as provided by mccFile are used as prior parameters for the estimation process. In this case there are two further options depending on the value of xiPooled: If xiPooled=TRUE, equal apriori transition probabilities are used for all groups (using mcc[[1]]$xi) and if xiPooled=FALSE group-specific apriori transition probabilities are used (using mcc[[H]]$xi).

If mccAsPrior=FALSE, a priori transition probabilities are determined depending on persPrior. In this case the diagonal elements are set to persPrior and the off-diagonal elements to (1 - persPrior)/K, equal for all groups.

xiPooled

Only used if mccAsPrior=TRUE (see above): if xiPooled=TRUE equal apriori transition probabilities are used for all groups (using mcc[[1]]$xi) and if xiPooled=FALSE group-specific apriori transition probabilities are used (using
mcc[[H]]$xi).

persPrior

Only used if mccAsPrior=FALSE: a numerical value (between 0 and 1) indicates the persistence probability (equal for all diagonal elements) for the a priori transition probabilities. 1/(K + 1) corresponds to uniform distribution in each row.

betaPrior

A character. If "uninformative" (improper) prior parameters are used for the regression coefficients (i.e. betaPriorVar = ). Otherwise mean and variance of the normal prior distribution for the regression coefficients have to be specified.

betaPriorMean, betaPriorVar

Numerical values specifying the parameters of the normal prior distribution for the regression coefficients, only if betaPrior!="uninformative".

Initial contains:

mccUse

If TRUE, prior information for the group sizes and the transition probabilities as provided with mccFile are used for the estimation process as initial values. If FALSE, initial values for group sizes are set to 1/H and for transition probabilities determined by use of pers for the diagonal elements and (1 - pers)/K for the off-diagonal elements.

pers

Only used if mccUse=FALSE: A numerical value (between 0 and 1) which indicates the persistence probabilities (equal for all diagonal elements). Note, that 1/(K+1) corresponds to the uniform distribution in each row.

S.i.start

A vector of length N giving an initial allocation (mandatory for dmClustExtended).

Beta.start

A matrix of dimension ncol(X) x H giving start values for the regression coefficients including the zero vector in the first column representing the baseline group.

Mcmc contains:

kNo

A numerical value between 1 and K+1 indicating the number of row elements to be updated in each iteration. Note that eventually notation l is used in the literature.

M

An integer indicating the overall number of iterations.

M0

An integer indicating the number of the first iteration after the burn-in phase.

mOut

An integer indicating that after each mOut-th iteration a report line is written to the output window/screen.

mSave

An integer indicating that after each mSave-th iteration an intermediate storage of the workspace is carried out.

showAcc

If TRUE, additionally the current acceptance rate of the recent mOut draws of the M-H-steps is shown in the log-file and on the screen. Rule of thumb for the acceptance probability: should be around 0.25, at least between 0.15 and 0.4.

monitor

If TRUE, the paths of the draws of e_h and ξ_h starting at the beginning (m=1) up to the current draws are shown and currently updated in a diagram.

seed

An integer indicating a random seed.

Value

A list containing (/the output file contains):

workspaceFile

A character indicating the name of and the path (based on the currend working directory) to the output file, wherein all the results are saved. The name of the output file starts with “DMC_” or “DMC_Logit_newAux_” respectively followed by the number of groups H, the number of iterations M and the particular point in time when the function was called, with format: yyyymmdd_hhmmss. E.g. DMC_H4_M10000_20110218_045254.RData or
DMC_Logit_newAux_H4_M10000_20111121_165723.RData.

accept

A 3-dimensional array with dimension M x H*(K+1) x 2. This array contains the (calculated) acceptance probabilities (accProb) of the M-H-algorithm and whether the draw(s) were accepted or not (accYesNo) for each row j in each group h in the m-th iteration. The first dimension indicates the m-th iteration, the second dim row 1,…,K+1 in group 1, then row 1,…,K+1 in group 2 and so on. The third dim indicates accProb and accYesNo.

Beta.m

A 3-dimensional array of dimension ncol(X) x H x M containing the draws for the regression coefficients β_h in each m-th iteration step.

bk0

The prior parameters for the mean vectors of the normal (prior) distributions of the regression coefficients.

Bk0inv

The prior parameters for the inverse variance-covariance matrices of the normal (prior) distributions of the regression coefficients.

Data

The argument Data.

e_h_0

A 3-dimensional array with dimension K+1 x K+1 x H containing the (calculated) initial values for e_h.

e_h_m

A 4-dimensional array with dimension K+1 x K+1 x H x M containing the draws for e_h in the m-th iteration step.

eta_m

A matrix of dimension M x H containing the draws for η_h in each m-th iteration step.

fileName

A character value indicating the name of the output file (see also workspaceFile).

Initial

The argument Initial.

K

An integer indicating the number of categories minus one (!). See Note.

logFileName

A character value indicating the name of the log file and the corresponding directory.

mcc

The prior data (see Section Prior Data) provided with mccFile, NULL otherwise.

Mcmc

The argument Mcmc.

N

An integer indicating N, the number of individuals/units/objects.

Njk.i

The data (see Details) provided with dataFile.

Prior

The argument Prior.

S_i_freq

A H x N-matrix containing the frequencies how often individual i was allocated to a certain group during the iterations from M0+1 to codeM.

xi_h_m

A 4-dimensional array of dimension (K+1) x (K+1) x H x M containing the draws for ξ_h in each m-th iteration step.

xi_prior

A 3-dimensional array of dimension (K+1) x (K+1) x H that contains the finally used a priori parameter values for ξ_h.

bkN

The posterior parameters (in the last iteration step) for the mean vectors of the normal (posterior) distributions from which the regression coefficients were drawn.

BkN

The posterior parameters (in the last iteration step) for the variance-covariance matrices of the normal (posterior) distributions from which the regression coefficients were drawn.

logLike

A vector containing the values of the log-likelihood calculated in each iteration step.

logBetaPrior

A vector containing the values of the prior distribution for the regression coefficients calculated in each iteration step.

logEPrior

A vector containing the values of the prior distribution for e calculated in each iteration step.

logPostDens

A vector containing the values of the posterior density calculated in each iteration step.

mMax

An integer giving the position (number of iteration) of the maximum value in the posterior density logPostDens.

logClassLike

A vector containing the values of the log classification likelihood calculated in each iteration step.

entropy

A vector containing the values of the entropy calculated in each iteration step.

logEtaPrior

A vector containing the values of the prior distribution for the mixing proportions (group sizes) calculated in each iteration step.

Prior Data

The prior data (called mcc in the following) – to be passed via mccFile in argument-list Data – has to be a list of lists, indexed by 1,…,H,H+1,…. Note that, depending on parameter H (the number of groups – to be passed via H in argument-list Prior), there have to be at least H entries (each a list). See mccXiPrior in LMEntryPaperData for example. Within a call to dmClustering or mcClustering, at least mcc[[H]] has to be provided as a list containing eta and xi. eta is a vector of length H containing prior information about the relative group sizes of group h=1,…,H. xi is a 3-dimensional array of dimension (K+1) x (K+1) x H, containing prior information in terms of probabilities about the transition probabilities of group h=1,…,H (see examples).

Log File

The log file keeps record of the progress of the estimation procedure (which is also shown on the screen). At first some prior parameters and the MCMC-settings and the name of the output file are documented. Then for each mOut-th iteration step (at least for m=1, …, 5, 10, 20, 50, 100, 200, 500) information about the elapsed time and the expected time to the end and optionally the current acceptance rate (showAcc=TRUE) is indicated. Finally the total time is shown.

For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Data loaded!
Data Information: Datafile = no file name , N = 9809 , K = 5
Manual Settings: No of groups H = 4 , kNo = 2
MCMC Parameters: M = 10000 , M0 = 5000 , mOut = 200 , mSave = 5000 , seed = 123456 , showAcc = TRUE
Prior Parameters for e_h (Neg Multinom): a0 = 1 , alpha = 1 , N0 = 10 , xi_prior (see below)
Information on xi_prior (for Neg Bin/Neg Multinom Prior):  with persPrior =  0.7  created xi_prior (equal in each group, mccAsPrior = FALSE & xiPooled = FALSE)
Prior information and parameters set!
Inital Values Information: mccUse = FALSE , pers = 0.7
Initial values set!
Initialisations done!
MCMC Iteration...
m = 1 ; Acc Rate of first draws = 0.54 
m = 2 ; duration of iter proc so far: 8.17 sec. ,  exp time to end: 1361.53  min. ; Acc Rate of last 2 draws = 0.5 
m = 3 ; duration of iter proc so far: 16.45 sec. ,  exp time to end: 1370.56  min. ; Acc Rate of last 3 draws = 0.49 
m = 4 ; duration of iter proc so far: 24.62 sec. ,  exp time to end: 1367.37  min. ; Acc Rate of last 4 draws = 0.49 
m = 5 ; duration of iter proc so far: 32.84 sec. ,  exp time to end: 1367.79  min. ; Acc Rate of last 5 draws = 0.53
m = 10 ; duration of iter proc so far: 73.97 sec. ,  exp time to end: 1368.58  min. ; Acc Rate of last 10 draws = 0.57
m = 20 ; duration of iter proc so far: 156.61 sec. ,  exp time to end: 1371.16  min. ; Acc Rate of last 20 draws = 0.61
m = 50 ; duration of iter proc so far: 404.42 sec. ,  exp time to end: 1368.84  min. ; Acc Rate of last 50 draws = 0.59
m = 100 ; duration of iter proc so far: 815.86 sec. ,  exp time to end: 1359.9  min. ; Acc Rate of last 100 draws = 0.61
m = 200 ; duration of iter proc so far: 1635.61 sec. ,  exp time to end: 1342.6  min. ; Acc Rate of last 200 draws = 0.62
m = 400 ; duration of iter proc so far: 3270.83 sec. ,  exp time to end: 1311.75  min. ; Acc Rate of last 200 draws = 0.51
m = 500 ; duration of iter proc so far: 4087.97 sec. ,  exp time to end: 1297.25  min. ; Acc Rate of last 200 draws = 0.47
m = 1000 ; duration of iter proc so far: 8165.91 sec. ,  exp time to end: 1226.25  min. ; Acc Rate of last 200 draws = 0.45
...
m = 10000 ; duration of iter proc so far: 81362.58 sec. ,  exp time to end: 0.14  min. ; Acc Rate of last 200 draws = 0.46
Total time:  22 hours 36 min

Warning

Note that there are no such things as default values (see Section Arguments)!

Note

Note that the required data files have to be provided in the current working directory and that the results (see Section Value) are to be saved in the directory provided by storeDir within the current working directory. Make sure that the current working directory is set appropriately before the function is called.

Note, that in contrast to the literature (see References), the numbering (labelling) of the states of the categorical outcome variable (time series) in this package is sometimes 0,...,K (instead of 1,...,K), however, there are K+1 categories (states)!

Author(s)

Christoph Pamminger <christoph.pamminger@gmail.com>

References

Sylvia Fruehwirth-Schnatter, Christoph Pamminger, Andrea Weber and Rudolf Winter-Ebmer, (2011), "Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering". Journal of Applied Econometrics. DOI: 10.1002/jae.1249 http://onlinelibrary.wiley.com/doi/10.1002/jae.1249/abstract

Christoph Pamminger and Sylvia Fruehwirth-Schnatter, (2010), "Model-based Clustering of Categorical Time Series". Bayesian Analysis, Vol. 5, No. 2, pp. 345-368. DOI: 10.1214/10-BA606 http://ba.stat.cmu.edu/journal/2010/vol05/issue02/pamminger.pdf

Sylvia Fruehwirth-Schnatter and Rudolf Fruehwirth, (2010), "Data augmentation and MCMC for binary and multinomial logit models". In T. Kneib and G. Tutz (eds): Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir. Physica Verlag, Heidelberg, pp. 111-132. DOI: 10.1007/978-3-7908-2413-1_7 http://www.springerlink.com/content/t4h810017645wh68/. See also: IFAS Research Paper Series 2010-48 (http://www.jku.at/ifas/content/e108280/e108491/e108471/e109880/ifas_rp48.pdf).

See Also

mcClust, mcClustExtended, MNLAuxMix, MCCExampleData, MCCExtExampleData

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
#rm(list=ls(all=TRUE))

# ==================================================================================
if ( FALSE ) {
# ==================================================================================

# set working directory
oldDir <- getwd()
curDir <- tempdir()
setwd(curDir)

if ( !file.exists("bayesMCClust-wd") ) dir.create("bayesMCClust-wd")
setwd("bayesMCClust-wd")
myOutfilesDir <- "dmClust-Example-Outfiles"

# load data
data(MCCExampleData)

# function call 
system.time(
  outList <- dmClust(   # parameter lists (every four) must be complete!
     Data = list( dataFile=MCCExampleData$Njk.i,  
                     storeDir=myOutfilesDir,       
                     mccFile=MCCExampleData$somePrior),                     
     Prior   = list( H=2, # sample(2:5, 1), # 3 
                     alpha0=4, 
                     a0=1,      
                     alpha=1, 
                     N0=10,
                     isPriorNegBin=FALSE, 
                     mccAsPrior=TRUE, 
                     xiPooled=FALSE, 
                     persPrior=0.7),                     
     Initial = list( mccUse=FALSE, 
                     pers=1/3 ),                     
     Mcmc    = list( kNo=2, 
                     M=100, 
                     M0=20, 
                     mOut=5, 
                     mSave=50, 
                     showAcc=TRUE, 
                     monitor=FALSE, 
                     seed=sample(1:100000, 1) # 12345
     ) 
  )
)

str(outList)

#outFileName
#results <- load(outFileName)
#results

if (outList$Prior$H > 1) { 
    apply(outList$xi_h_m[,,,seq(outList$Mcmc$M0, outList$Mcmc$M, 1)], c(1,2,3), mean) 
    } else { 
    apply(outList$xi_h_m[,,,seq(outList$Mcmc$M0,outList$Mcmc$M,1)], c(1, 2), mean)
}

allocList <- calcAllocationsDMC(outList, thin=1, maxi=50) # , plotPathsForEta=TRUE
str(allocList)

myTransProbs <- calcTransProbs(outList, estGroupSize=allocList$estGroupSize, thin=1, 
    printXtable=FALSE, printSd=FALSE, printTogether=TRUE ) 
    # grLabels=paste("Group", 1:Prior$H), plotPaths=TRUE
str(myTransProbs)

myTransList <- plotTransProbs(outList, estTransProb=myTransProbs$estTransProb, 
    estGroupSize=allocList$estGroupSize, class=allocList$class, plotPooled=TRUE, 
    plotContTable=TRUE, printContTable=TRUE, plotContPooled=TRUE) 
    # , grLabels=paste("Group", 1:Prior$H)
str(myTransList)

(equiDist <- calcEquiDist(outList, thin=1, maxi=50)) 
# , printEquiDist=TRUE, plotEquiDist=TRUE, grLabels=paste("Group", 1:Prior$H)

myVariation <- calcVariationDMC(outList, thin=1, maxi=50) 
# , printVarE=TRUE, printUnobsHet=TRUE, printUnobsHetSd=TRUE, 
# printUnobsHetAll=TRUE, printAllTogether=TRUE, grLabels=paste("Group", 1:Prior$H)
str(myVariation)

myPars <- calcParMatDMC(outList, thin=1) 
# , grLabels=paste("Group", 1:Prior$H), printPar=TRUE
str(myPars)

myLongRunDistList <- calcLongRunDist(outList, 
    initialStateData=MCCExampleData$initialState, 
    class=allocList$class, equiDist=equiDist, thin=1, maxi=5) 
    # , printLongRunDist=TRUE, , grLabels=paste("Group", 1:Prior$H)
str(myLongRunDistList)

myTypicalMembs <- plotTypicalMembers(outList, moreTypMemb=c(10,13,17,20,23,27,30), 
    myObsList=MCCExampleData$obsList, classProbs=allocList$classProbs) 
    # , noTypMemb=7, moreTypMemb=c(10,25,50,100,200,500,1000)
str(myTypicalMembs)

plotScatter(outList, thin=1, xi11=c(1,1), xi12=c(2,2), xi21=c(2,2), 
    xi22=c(3,3), xi31=c(1,1), xi32=c(3,3) )

mySegPower <- calcSegmentationPower(outList, classProbs=allocList$classProbs, 
    class=allocList$class, printXtable=TRUE, calcSharp=TRUE, printSharpXtable=TRUE ) 
    # , grLabels=paste("Group", 1:Prior$H)
str(mySegPower)

myEntropy <- calcEntropy(outList, classProbs=allocList$classProbs, 
    class=allocList$class, printXtable=TRUE ) 
    # , grLabels=paste("Group", 1:Prior$H)
myEntropy

plotLikeliPaths(outList, from=10, by=1 )

myNumEffTables <- calcNumEff( outList, thin=1, printXi=TRUE, printE=TRUE, 
    printBeta=TRUE, grLabels=paste("Group", 1:outList$Prior$H) ) 
str(myNumEffTables)

myMSCrits <- calcMSCritDMC(workDir=myOutfilesDir, myLabel="dmClust-Example", 
    myN0=paste("N0 =",outList$Prior$N0), 
    whatToDoList=c("postMode", "approxML", "approxMCL") )
str(myMSCrits)

setwd(oldDir)

} # end if

# ==================================================================================
# ==================================================================================
# ==================================================================================

# ==================================================================================
if ( FALSE ) {
# ==================================================================================

rm(list=ls(all=TRUE))

# set working directory
oldDir <- getwd()
curDir <- tempdir()
setwd(curDir)

if ( !file.exists("bayesMCClust-wd") ) dir.create("bayesMCClust-wd")
setwd("bayesMCClust-wd") 
myOutfilesDir <- "dmClustExtended-Example-Outfiles"      

# load data 
data(MCCExtExampleData)
if (!is.element("MCCExtExampleData$covariates", search())) {
    attach(MCCExtExampleData$covariates)
}

# ==================================================================================

groupNr <- 2 # sample(2:6, 1) # 3

# ==================================================================================

results <- kmeans( log( MCCExtExampleData$NjkiMat + 0.5 ) , groupNr, nstart=2)

# ==================================================================================

require(nnet, quietly = TRUE)
H <- groupNr
X = cbind( intercept=1, alrateBezNew, unskilled, skilled, angStart ) 

N <- dim(X)[1]
mX <- data.frame( cbind(group=as.factor( results$cluster ), X[,-1], 
    matrix(sample(1:H,H*N,replace=TRUE),N,H)) )

colnames(mX)[6:(6+groupNr-1)] <- 
    c( "as.1", "as.2", "as.3", "as.4", "as.5", "as.6" )[1:groupNr] 

tempMNom <- multinom(group ~ alrateBezNew+ unskilled+ skilled+ angStart, 
    data=as.data.frame(mX)) 

toStartBeta <- t(rbind(0,coef( tempMNom )))

outList <- dmClustExtended(      
     Data = list( dataFile=MCCExtExampleData$Njk.i,   
                  storeDir=myOutfilesDir,       
                  mccFile=NULL,   
                  X = cbind(intercept=1, alrateBezNew, unskilled, skilled, angStart )), 
     Prior   = list( H=groupNr, 
                     a0=1,      
                     alpha=1, 
                     N0=10,
                     isPriorNegBin=FALSE, 
                     mccAsPrior=FALSE, 
                     xiPooled=FALSE, 
                     persPrior=0.7,
                     betaPrior = "informative", # N(0,1)
                     betaPriorMean = 0,
                     betaPriorVar = 1),                     
     Initial = list( mccUse=FALSE, 
                     pers=1/3,
                     S.i.start = results$cluster,
                     Beta.start = toStartBeta ),      
     Mcmc    = list( kNo=2, 
                     M=100, 
                     M0=50, 
                     mOut=10, 
                     mSave=50,  
                     showAcc=TRUE, 
                     monitor=FALSE, 
                     seed=sample(1:100000, 1) # 564847
                   ) 
)

str(outList)

#outFileName <- outList$workspaceFile
#outFileName
#results <- load(outFileName)
#results

if (outList$Prior$H > 1) { 
    apply( outList$xi_h_m[,,,seq(outList$Mcmc$M0, outList$Mcmc$M, 1)], c(1,2,3), mean) 
    } else { 
    apply(outList$xi_h_m[,,,seq(outList$Mcmc$M0,outList$Mcmc$M,1)], c(1, 2), mean) 
}

allocList <- calcAllocationsDMCExt(outList, thin=1, maxi=50) 
str(allocList)

myTransProbs <- calcTransProbs(outList, estGroupSize=allocList$estGroupSize, thin=1, 
    printXtable=FALSE, printSd=FALSE, printTogether=TRUE ) 
    # grLabels=paste("Group", 1:Prior$H), plotPaths=TRUE
str(myTransProbs)

myTransList <- plotTransProbs(outList, estTransProb=myTransProbs$estTransProb, 
    estGroupSize=allocList$estGroupSize, class=allocList$class, plotPooled=TRUE, 
    plotContTable=TRUE, printContTable=TRUE, plotContPooled=TRUE) 
    # , grLabels=paste("Group", 1:Prior$H)
str(myTransList)

(equiDist <- calcEquiDist(outList, thin=1, maxi=50)) 
# , printEquiDist=TRUE, plotEquiDist=TRUE, grLabels=paste("Group", 1:Prior$H)

myVariation <- calcVariationDMC(outList, thin=1, maxi=50) 
# , printVarE=TRUE, printUnobsHet=TRUE, printUnobsHetSd=TRUE, 
# printUnobsHetAll=TRUE, printAllTogether=TRUE, grLabels=paste("Group", 1:Prior$H)
str(myVariation)

myPars <- calcParMatDMC(outList, thin=1) 
# , grLabels=paste("Group", 1:Prior$H), printPar=TRUE
str(myPars)

myRegCoeffs <- calcRegCoeffs(outList, hBase=2, thin=1) 
#, M0=Mcmc$M0, grLabels=paste("Group", 1:Prior$H), printHPD=TRUE, 
# plotPaths=TRUE, plotACFs=TRUE
str(myRegCoeffs)

myLongRunDistList <- calcLongRunDist(outList, initialStateData=initialState, 
    class=allocList$class, equiDist=equiDist, maxi=2) 
    # , printLongRunDist=TRUE
str(myLongRunDistList)

myTypicalMembs <- plotTypicalMembers(outList, myObsList=MCCExtExampleData$obsList, 
    classProbs=allocList$classProbs) 
    # , noTypMemb=7, moreTypMemb=c(10,25,50,100,200,500,1000)
str(myTypicalMembs)

plotScatter(outList, thin=1, xi11=c(1,1), xi12=c(2,2), xi21=c(2,2), 
    xi22=c(3,3), xi31=c(1,1), xi32=c(3,3) )

mySegPower <- calcSegmentationPower(outList, classProbs=allocList$classProbs, 
    class=allocList$class, printXtable=TRUE, calcSharp=TRUE, 
    printSharpXtable=TRUE ) 
    # , grLabels=paste("Group", 1:Prior$H)
str(mySegPower)

myEntropy <- calcEntropy(outList, classProbs=allocList$classProbs, 
    class=allocList$class, printXtable=TRUE ) 
    # , grLabels=paste("Group", 1:Prior$H)
myEntropy

plotLikeliPaths(outList, from=10, by=1 )

myNumEffTables <- calcNumEff( outList, thin=1, printXi=TRUE, printE=TRUE, 
    printBeta=TRUE, grLabels=paste("Group", 1:outList$Prior$H) ) 
str(myNumEffTables)

myMSCrits <- calcMSCritDMCExt(workDir=myOutfilesDir, myLabel="dmClustExtended-Example", 
    myN0=paste("N0 =",outList$Prior$N0), 
    whatToDoList=c("postMode", "approxML", "approxMCL") )
str(myMSCrits)

setwd(oldDir)

# ==================================================================================

if (is.element("MCCExtExampleData$covariates", search())) { 
    detach(MCCExtExampleData$covariates)
}

# ==================================================================================
} # end if
# ==================================================================================

# ==================================================================================

bayesMCClust documentation built on May 29, 2017, 3:31 p.m.