fit_model: Estimate Parameters of (Mixture) Hidden Markov Models and...

Description Usage Arguments Details Value See Also Examples

View source: R/fit_model.R

Description

Function fit_model estimates the parameters of mixture hidden Markov models and its restricted variants using maximimum likelihood. Initial values for estimation are taken from the corresponding components of the model with preservation of original zero probabilities.

Usage

1
2
3
4
fit_model(model, em_step = TRUE, global_step = FALSE,
  local_step = FALSE, control_em = list(), control_global = list(),
  control_local = list(), lb, ub, threads = 1, log_space = FALSE,
  ...)

Arguments

model

An object of class hmm or mhmm.

em_step

Logical. Whether or not to use the EM algorithm at the start of the parameter estimation. The default is TRUE.

global_step

Logical. Whether or not to use global optimization via nloptr (possibly after the EM step). The default is FALSE.

local_step

Logical. Whether or not to use local optimization via nloptr (possibly after the EM and/or global steps). The default is FALSE.

control_em

Optional list of control parameters for the EM algorithm. Possible arguments are

maxeval

The maximum number of iterations, the default is 1000. Note that iteration counter starts with -1 so with maxeval=1 you get already two iterations. This is for backward compatibility reasons.

print_level

The level of printing. Possible values are 0 (prints nothing), 1 (prints information at the start and the end of the algorithm), 2 (prints at every iteration), and for mixture models 3 (print also during optimization of coefficients).

reltol

Relative tolerance for convergence defined as (logLik_new - logLik_old)/(abs(logLik_old) + 0.1). The default is 1e-10.

restart

A list containing options for possible EM restarts with the following components:

times

Number of restarts of the EM algorithm using random initial values. The default is 0, i.e. no restarts.

transition

Logical. Should the original transition probabilities be varied? The default is TRUE.

emission

Logical. Should the original emission probabilities be varied? The default is TRUE.

sd

Standard deviation for rnorm used in randomization. The default is 0.25.

maxeval

Maximum number of iterations, the default is control_em$maxeval

print_level

Level of printing in restarted EM steps. The default is control_em$print_level.

reltol

Relative tolerance for convergence at restarted EM steps. The default is control_em$reltol. If the relative change of the final model of the restart phase is larger than the tolerance for the original EM phase, the final model is re-estimated with the original reltol and maxeval at the end of the EM step.

n_optimum

Save the log-likelihood values of the n_optimum best models (from all estimated models including the the first EM run.). The default is min(times + 1, 25).

use_original

If TRUE. Use the initial values of the input model as starting points for the permutations. Otherwise permute the results of the first EM run.

control_global

Optional list of additional arguments for nloptr argument opts. The default values are

algorithm

"NLOPT_GD_MLSL_LDS"

local_opts

list(algorithm = "NLOPT_LD_LBFGS", ftol_rel = 1e-6, xtol_rel = 1e-4)

maxeval

10000 (maximum number of iterations in global optimization algorithm.)

maxtime

60 (maximum time for global optimization. Set to 0 for unlimited time.)

control_local

Optional list of additional arguments for nloptr argument opts. The default values are

algorithm

"NLOPT_LD_LBFGS"

ftol_rel

1e-10

xtol_rel

1e-8

maxeval

10000 (maximum number of iterations)

lb, ub

Lower and upper bounds for parameters in Softmax parameterization. The default interval is [pmin(-25, 2*initialvalues), pmax(25, 2*initialvalues)], except for gamma coefficients, where the scale of covariates is taken into account. Note that it might still be a good idea to scale covariates around unit scale. Bounds are used only in the global optimization step.

threads

Number of threads to use in parallel computing. The default is 1.

log_space

Make computations using log-space instead of scaling for greater numerical stability at a cost of decreased computational performance. The default is FALSE.

...

Additional arguments to nloptr.

Details

The fitting function provides three estimation steps: 1) EM algorithm, 2) global optimization, and 3) local optimization. The user can call for one method or any combination of these steps, but should note that they are preformed in the above-mentioned order. The results from a former step are used as starting values in a latter, except for some of global optimization algorithms (such as MLSL and StoGO) which only use initial values for setting up the boundaries for the optimization.

It is possible to rerun the EM algorithm automatically using random starting values based on the first run of EM. Number of restarts is defined by the restart argument in control_em. As the EM algorithm is relatively fast, this method might be preferred option compared to the proper global optimization strategy of step 2.

The default global optimization method (triggered via global_step = TRUE) is the multilevel single-linkage method (MLSL) with the LDS modification (NLOPT_GD_MLSL_LDS as algorithm in control_global), with L-BFGS as the local optimizer. The MLSL method draws random starting points and performs a local optimization from each. The LDS modification uses low-discrepancy sequences instead of pseudo-random numbers as starting points and should improve the convergence rate. In order to reduce the computation time spent on non-global optima, the convergence tolerance of the local optimizer is set relatively large. At step 3, a local optimization (L-BFGS by default) is run with a lower tolerance to find the optimum with high precision.

There are some theoretical guarantees that the MLSL method used as the default optimizer in step 2 shoud find all local optima in a finite number of local optimizations. Of course, it might not always succeed in a reasonable time. The EM algorithm can help in finding good boundaries for the search, especially with good starting values, but in some cases it can mislead. A good strategy is to try a couple of different fitting options with different combinations of the methods: e.g. all steps, only global and local steps, and a few evaluations of EM followed by global and local optimization.

By default, the estimation time is limited to 60 seconds in global optimization step, so it is advisable to change the default settings for the proper global optimization.

Any algorithm available in the nloptr function can be used for the global and local steps.

Value

logLik

Log-likelihood of the estimated model.

em_results

Results after the EM step: log-likelihood (logLik), number of iterations (iterations), relative change in log-likelihoods between the last two iterations (change), and the log-likelihoods of the n_optimum best models after the EM step (best_opt_restart).

global_results

Results after the global step.

local_results

Results after the local step.

call

The matched function call.

See Also

build_hmm, build_mhmm, build_mm, build_mmm, and build_lcm for constructing different types of models; summary.mhmm for a summary of a MHMM; separate_mhmm for reorganizing a MHMM into a list of separate hidden Markov models; plot.hmm and plot.mhmm for plotting model objects; and ssplot and mssplot for plotting stacked sequence plots of hmm and mhmm objects.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
# Hidden Markov model

data("mvad", package = "TraMineR")

mvad_alphabet <-
  c("employment", "FE", "HE", "joblessness", "school", "training")
mvad_labels <- c("employment", "further education", "higher education",
  "joblessness", "school", "training")
mvad_scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad_seq <- seqdef(mvad, 17:86, alphabet = mvad_alphabet,
  states = mvad_scodes, labels = mvad_labels, xtstep = 6)

attr(mvad_seq, "cpal") <- colorpalette[[6]]

# Starting values for the emission matrix
emiss <- matrix(
  c(0.05, 0.05, 0.05, 0.05, 0.75, 0.05, # SC
    0.05, 0.75, 0.05, 0.05, 0.05, 0.05, # FE
    0.05, 0.05, 0.05, 0.4,  0.05, 0.4,  # JL, TR
    0.05, 0.05, 0.75, 0.05, 0.05, 0.05, # HE
    0.75, 0.05, 0.05, 0.05, 0.05, 0.05),# EM
  nrow = 5, ncol = 6, byrow = TRUE)

# Starting values for the transition matrix
trans <- matrix(0.025, 5, 5)
diag(trans) <- 0.9

# Starting values for initial state probabilities
initial_probs <- c(0.2, 0.2, 0.2, 0.2, 0.2)

# Building a hidden Markov model
init_hmm_mvad <- build_hmm(observations = mvad_seq,
  transition_probs = trans, emission_probs = emiss,
  initial_probs = initial_probs)

## Not run: 
set.seed(21)
fit_hmm_mvad <- fit_model(init_hmm_mvad, control_em = list(restart = list(times = 50)))
hmm_mvad <- fit_hmm_mvad$model

## End(Not run)

# save time, load the previously estimated model
data("hmm_mvad")

# Markov model
# Note: build_mm estimates model parameters from observations,
# no need for estimating with fit_model

mm_mvad <- build_mm(observations = mvad_seq)

# Comparing likelihoods, MM fits better
logLik(hmm_mvad)
logLik(mm_mvad)

## Not run: 
require("igraph") #for layout_in_circle

plot(mm_mvad, layout = layout_in_circle, legend.prop = 0.3,
  edge.curved = 0.3, edge.label = NA,
  vertex.label.pos = c(0, 0, pi, pi, pi, 0))

##############################################################


#' # Three-state three-channel hidden Markov model
# See ?hmm_biofam for five-state version

data("biofam3c")

# Building sequence objects
marr_seq <- seqdef(biofam3c$married, start = 15,
  alphabet = c("single", "married", "divorced"))
child_seq <- seqdef(biofam3c$children, start = 15,
  alphabet = c("childless", "children"))
left_seq <- seqdef(biofam3c$left, start = 15,
  alphabet = c("with parents", "left home"))

# Define colors
attr(marr_seq, "cpal") <- c("violetred2", "darkgoldenrod2", "darkmagenta")
attr(child_seq, "cpal") <- c("darkseagreen1", "coral3")
attr(left_seq, "cpal") <- c("lightblue", "red3")

# Starting values for emission matrices

emiss_marr <- matrix(NA, nrow = 3, ncol = 3)
emiss_marr[1,] <- seqstatf(marr_seq[, 1:5])[, 2] + 1
emiss_marr[2,] <- seqstatf(marr_seq[, 6:10])[, 2] + 1
emiss_marr[3,] <- seqstatf(marr_seq[, 11:16])[, 2] + 1
emiss_marr <- emiss_marr / rowSums(emiss_marr)

emiss_child <- matrix(NA, nrow = 3, ncol = 2)
emiss_child[1,] <- seqstatf(child_seq[, 1:5])[, 2] + 1
emiss_child[2,] <- seqstatf(child_seq[, 6:10])[, 2] + 1
emiss_child[3,] <- seqstatf(child_seq[, 11:16])[, 2] + 1
emiss_child <- emiss_child / rowSums(emiss_child)

emiss_left <- matrix(NA, nrow = 3, ncol = 2)
emiss_left[1,] <- seqstatf(left_seq[, 1:5])[, 2] + 1
emiss_left[2,] <- seqstatf(left_seq[, 6:10])[, 2] + 1
emiss_left[3,] <- seqstatf(left_seq[, 11:16])[, 2] + 1
emiss_left <- emiss_left / rowSums(emiss_left)

# Starting values for transition matrix
trans <- matrix(c(0.9, 0.07, 0.03,
                0,  0.9,  0.1,
                0,    0,    1), nrow = 3, ncol = 3, byrow = TRUE)

# Starting values for initial state probabilities
inits <- c(0.9, 0.09, 0.01)

# Building hidden Markov model with initial parameter values
init_hmm_bf <- build_hmm(
  observations = list(marr_seq, child_seq, left_seq),
  transition_probs = trans,
  emission_probs = list(emiss_marr, emiss_child, emiss_left),
  initial_probs = inits)

# Fitting the model with different optimization schemes

# Only EM with default values
hmm_1 <- fit_model(init_hmm_bf)
hmm_1$logLik # -24179.1

# Only L-BFGS
hmm_2 <- fit_model(init_hmm_bf, em_step = FALSE, local_step = TRUE)
hmm_2$logLik # -22267.75

# Global optimization via MLSL_LDS with L-BFGS as local optimizer and final polisher
# This can be slow, use parallel computing by adjusting threads argument
# (here threads = 1 for portability issues)
hmm_3 <- fit_model(
  init_hmm_bf, em_step = FALSE, global_step = TRUE, local_step = TRUE,
  control_global = list(maxeval = 5000, maxtime = 0), threads = 1)
hmm_3$logLik # -21675.42

# EM with restarts, much faster than MLSL
set.seed(123)
hmm_4 <- fit_model(init_hmm_bf, control_em = list(restart = list(times = 5)))
hmm_4$logLik # -21675.4

# Global optimization via StoGO with L-BFGS as final polisher
# This can be slow, use parallel computing by adjusting threads argument
# (here threads = 1 for portability issues)
set.seed(123)
hmm_5 <- fit_model(
   init_hmm_bf, em_step = FALSE, global_step = TRUE, local_step = TRUE, 
   lb = -50, ub = 50, control_global = list(algorithm = "NLOPT_GD_STOGO", 
   maxeval = 2500, maxtime = 0), threads = 1)
hmm_5$logLik # -21675.4

##############################################################

# Mixture HMM

data("biofam3c")

## Building sequence objects
marr_seq <- seqdef(biofam3c$married, start = 15,
  alphabet = c("single", "married", "divorced"))
child_seq <- seqdef(biofam3c$children, start = 15,
  alphabet = c("childless", "children"))
left_seq <- seqdef(biofam3c$left, start = 15,
  alphabet = c("with parents", "left home"))

## Choosing colors
attr(marr_seq, "cpal") <- c("#AB82FF", "#E6AB02", "#E7298A")
attr(child_seq, "cpal") <- c("#66C2A5", "#FC8D62")
attr(left_seq, "cpal") <- c("#A6CEE3", "#E31A1C")

## Starting values for emission probabilities
# Cluster 1
B1_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.3, 0.6, 0.1, # High probability for married
    0.3, 0.3, 0.4), # High probability for divorced
  nrow = 4, ncol = 3, byrow = TRUE)

B1_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.9, 0.1,
    0.9, 0.1),
  nrow = 4, ncol = 2, byrow = TRUE)

B1_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9, # High probability for having left home
    0.1, 0.9,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

# Cluster 2

B2_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.1, 0.8, 0.1, # High probability for married
    0.7, 0.2, 0.1),
  nrow = 4, ncol = 3, byrow = TRUE)

B2_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.9, 0.1,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

B2_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9,
    0.1, 0.9,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

# Cluster 3
B3_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.8, 0.1, 0.1,
    0.1, 0.8, 0.1, # High probability for married
    0.3, 0.4, 0.3,
    0.1, 0.1, 0.8), # High probability for divorced
  nrow = 6, ncol = 3, byrow = TRUE)

B3_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.5, 0.5,
    0.5, 0.5,
    0.5, 0.5,
    0.1, 0.9),
  nrow = 6, ncol = 2, byrow = TRUE)


B3_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9,
    0.5, 0.5,
    0.5, 0.5,
    0.1, 0.9,
    0.1, 0.9),
  nrow = 6, ncol = 2, byrow = TRUE)

# Starting values for transition matrices
A1 <- matrix(
  c(0.80, 0.16, 0.03, 0.01,
    0,    0.90, 0.07, 0.03,
    0,    0,    0.90, 0.10,
    0,    0,    0,       1),
  nrow = 4, ncol = 4, byrow = TRUE)

A2 <- matrix(
  c(0.80, 0.10, 0.05, 0.03, 0.01, 0.01,
    0,    0.70, 0.10, 0.10, 0.05, 0.05,
    0,    0,    0.85, 0.01, 0.10, 0.04,
    0,    0,    0,    0.90, 0.05, 0.05,
    0,    0,    0,    0,    0.90, 0.10,
    0,    0,    0,    0,    0,       1),
  nrow = 6, ncol = 6, byrow = TRUE)

# Starting values for initial state probabilities
initial_probs1 <- c(0.9, 0.07, 0.02, 0.01)
initial_probs2 <- c(0.9, 0.04, 0.03, 0.01, 0.01, 0.01)

# Birth cohort
biofam3c$covariates$cohort <- cut(biofam3c$covariates$birthyr, c(1908, 1935, 1945, 1957))
biofam3c$covariates$cohort <- factor(
  biofam3c$covariates$cohort, labels=c("1909-1935", "1936-1945", "1946-1957"))

# Build mixture HMM
init_mhmm_bf <- build_mhmm(
  observations = list(marr_seq, child_seq, left_seq),
  initial_probs = list(initial_probs1, initial_probs1, initial_probs2),
  transition_probs = list(A1, A1, A2),
  emission_probs = list(list(B1_marr, B1_child, B1_left),
    list(B2_marr, B2_child, B2_left),
    list(B3_marr, B3_child, B3_left)),
  formula = ~sex + cohort, data = biofam3c$covariates,
  channel_names = c("Marriage", "Parenthood", "Residence"))


# Fitting the model with different settings

# Only EM with default values
mhmm_1 <- fit_model(init_mhmm_bf)
mhmm_1$logLik # -12713.08

# Only L-BFGS
mhmm_2 <- fit_model(init_mhmm_bf, em_step = FALSE, local_step = TRUE)
mhmm_2$logLik # -12966.51

# Use EM with multiple restarts
set.seed(123)
mhmm_3 <- fit_model(init_mhmm_bf, control_em = list(restart = list(times = 5, transition = FALSE)))
mhmm_3$logLik # -12713.08

## End(Not run)

seqHMM documentation built on Nov. 6, 2018, 5:07 p.m.