knitr::opts_chunk$set( echo = TRUE, fig.height = 4, fig.width = 7, dpi = 128, cache.path = "cache/firstOrderVAR-cache/", fig.path = "fig/firstOrderVAR-fig/", error = FALSE) options(digits = 4, scipen = 10, width = 70)
We will simulate and estimate a first order vector auto-regression model in
this example using the particle filter and smoother. For details see
this vignette which can also be found by
calling vignette("Particle_filtering", package = "dynamichazard")
. The models
we are going to simulate from and estimate are of the form
$$ \begin{split} y_{it} &\sim g(\cdot\vert\eta_{it}) & \ \vec{\eta}t &= X_tR^+\vec{\alpha}_t + Z_t\vec{\beta} + \vec{o}_t \ \vec{\alpha}_t &= F\vec{\alpha}{t - 1} + R\vec{\epsilon}_t & \quad \vec{\epsilon}_t \sim N(\vec{0}, Q) \ & & \quad \vec{\alpha}_0 \sim N(\vec{a}_0, Q_0) \end{split}, \qquad \begin{array}{l} i = 1, \dots, n_t \ t = 1, \dots, d \end{array} $$
where the $y_{it}$ is individual $i$'s indicator at time $t$ for whether he dies between time $(t - 1, t]$. The indicators, $y_{it}$, are binomial distributed with the complementary log-log link function conditional on knowing the log times at risk, $\vec{o}_1,\cdots,\vec{o}_d$, covariates, $X_t$ and $Z_t$, and latent states, $\vec{\alpha}_1,\dots,\vec{\alpha}_d$. The total survival time of individual $i$ is $T_i$ which is piecewise constant exponentially distributed conditional on knowing the latent states. Further, we set $Z_t = X_t$ so the states have a non-zero mean. The true values are
$$ F = \begin{pmatrix}0.9 & 0 \ 0 & 0.9 \end{pmatrix}, \quad Q = \begin{pmatrix}0.33^2 & 0 \ 0 & 0.33^2 \end{pmatrix}, \quad R = I_2, \quad \vec{\beta} = (-6.5, -2)^\top $$
git_key <- system("git rev-parse --short HEAD", intern = TRUE) git_bra <- system("git branch", intern = TRUE) regexp <- "^(\\*\ )(.+)$" git_bra <- git_bra[grepl(regexp, git_bra)] git_bra <- gsub(regexp, "\\2", git_bra)
where $I_2$ is the two-dimensional identity matrix and
$\vec{a}_0$ and $Q_0$ are given by the invariant distribution. The unknown
parameters to be estimated is everything but $Q_0$ and $R$ (since we fix $Q_0$ doing the estimation and we set $\vec{a}_0 = (0, 0)^\top$). This
example is run on the git branch "r git_bra
" with ID "r git_key
". The code
can be found on
the github site for the package.
All functions which assignments are not shown and are not in the
dynamichazard
package can be found on the github site.
We start by simulating the data. Feel free to skip this part as the specifications are given above. First we assign the parameters for the simulation
library(dynamichazard)
n_obs <- 1000L n_periods <- 200L Fmat <- matrix(c(.9, 0, 0, .9), 2) Rmat <- diag(1 , 2) Qmat <- diag(.33^2, 2) Q_0 <- get_Q_0(Qmat, Fmat) beta <- c(-6.5, -2)
get_Q_0
is a function to get the covariance matrix for the invariant distribution.
Then we simulate and plot the latent states
set.seed(54432125) betas <- matrix(nrow = n_periods + 1, ncol = 2) betas[1, ] <- rnorm(2) %*% chol(Q_0) for(i in 1:n_periods + 1) betas[i, ] <- Fmat %*% betas[i - 1, ] + drop(rnorm(2) %*% chol(Qmat)) betas <- t(t(betas) + beta) # plot of latent variables cols <- c("black", "darkblue") matplot(betas, type = "l", lty = 1, col = cols) for(i in 1:2) abline(h = beta[i], lty = 2, col = cols[i])
We simulate the observations as follows
df <- replicate(n_obs, { # left-censoring tstart <- max(0L, sample.int((n_periods - 1L) * 2L, 1) - n_periods + 1L) # covariates x <- runif(1, -1, 1) covars <- c(1, x) # outcome (stop time and event indicator) y <- FALSE for(tstop in (tstart + 1L):n_periods){ fail_time <- rexp(1) / exp(covars %*% betas[tstop + 1L, ]) if(fail_time <= 1){ y <- TRUE tstop <- tstop - 1L + fail_time break } } c(tstart = tstart, tstop = tstop, x = x, y = y) }) df <- data.frame(t(df)) head(df, 10)
We left-censor the observations since we otherwise may end up with a low number of observations towards the end.
We can fit a model without the latent variables (i.e., a constant coefficient model) as follows
surv_fit <- survreg(Surv(tstop - tstart, y) ~ x, df, dist = "exponential") summary(surv_fit) # signs are flipped logLik(surv_fit)
The signs are flipped as stated in help("survreg")
. They are though close to
$\vec{\beta}$ as expected. Further, we can compare the log-likelihood with this
models with the log-likelihood approximation we get from the particle filter
in the next section.
We start off by using the generalized two-filter smoother from @briers09. We
estimate the parameters with an EM algorithm by calling PF_EM
as
follows
n_threads <- 6 # you can replace this with e.g., # max(parallel::detectCores(logical = FALSE), 2)))
set.seed(30520116) system.time(pf_Brier <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 750, N_smooth = 1, N_first = 2000, eps = .001, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 100, smoother = "Brier_O_N_square", # we add some extra variation to the proposal distributions nu = 8, covar_fac = 1.1, n_threads = n_threads)))
system.time
is used to show the computation time. The estimated parameters are
pf_Brier$F show_covar <- function(Q){ cat("Standard deviations\n") print(sqrt(diag(Q))) cat("Lower correlation matrix\n") tmp <- cov2cor(Q) print(tmp[-1, -ncol(tmp)]) } show_covar(pf_Brier$Q) pf_Brier$fixed_effects
The effective sample size in the smoothing step at each time point is
plot(pf_Brier$effective_sample_size$smoothed_clouds, type = "h", ylim = range(0, pf_Brier$effective_sample_size$smoothed_clouds), ylab = "Effective sample size")
plot_cloud <- function( pf_fit, type = "smoothed_clouds", start_at_zero = FALSE){ par_old <- par(no.readonly = TRUE) on.exit(par(par_old)) # plot random effects jpeg(tempfile()) # avoid output from the plot out <- plot(pf_fit, type = type, col = c("black", "blue")) dev.off() par(par_old) # then we use it and add the fixed effects y <- t(t(out$mean) + pf_fit$fixed_effects) ylim <- range(betas, y) x <- if(start_at_zero) 1:nrow(out$mean) - 1 else 1:nrow(out$mean) matplot(x, t(t(out$mean) + pf_fit$fixed_effects), type = "l", lty = 1, col = c("black", "blue"), ylim = ylim) # add actual points matplot(0:n_periods, betas, type = "l", add = TRUE, lty = 2, col = c("black", "blue"), ylim = ylim) # show places where we had a hard time sampling ess <- pf_fit$effective_sample_size[[type]] idx <- which(ess <= 100) for(i in idx - start_at_zero) abline(v = i, lty = 2) invisible(ess) }
The smoothed estimates of the latent states looks as follows (the vertical dashed lines are points where we did not sample well)
plot_cloud(pf_Brier)
The dashed non-vertical lines are the true curve and the continuous is the smoothed estimate. The approximate log-likelihoods from the particle filter at each EM iteration are
plot(pf_Brier$log_likes)
We can compare the output above with the smoother from @fearnhead10
set.seed(30520116) system.time(pf_Fear <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 1000, N_smooth = 2000, N_first = 2000, eps = .001, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 100, smoother = "Fearnhead_O_N", nu = 8, covar_fac = 1.1, n_threads = n_threads)))
The estimates are similar
pf_Fear$F show_covar(pf_Fear$Q) pf_Fear$fixed_effects
The effective sample size are better now though
plot(pf_Fear$effective_sample_size$smoothed_clouds, type = "h", ylim = range(0, pf_Fear$effective_sample_size$smoothed_clouds), ylab = "Effective sample size")
However, there are many options to reduce the computation time for the former smoother as e.g, mentioned in @briers09 but those are not implemented. The smoothed estimates are also rather close to what we saw before
plot_cloud(pf_Fear)
The log-likelihoods are almost the same
plot(pf_Fear$log_likes)
The log-likelihood where flat toward the end and not monotonically increasing
plot(tail(pf_Fear$log_likes, 20))
This will not happen in an EM algorithm but may happen in an MCEM algorithm due to the Monte Carlo error. We may want to take a few more EM iterations with more particles. We can do this as follows
set.seed(30520116) pf_Fear_more <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), Q = pf_Fear$Q, Fmat = pf_Fear$F, fixed_effects = pf_Fear$fixed_effects, by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 5000, N_smooth = 10000, N_first = 5000, eps = .001, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 10, smoother = "Fearnhead_O_N", nu = 8, covar_fac = 1.1, n_threads = n_threads))
The log-likelihoods from these iterations are
plot(pf_Fear_more$log_likes)
and the final estimates are
pf_Fear_more$F show_covar(pf_Fear_more$Q) pf_Fear_more$fixed_effects
Averaging is sometimes argued for in MCEM algorithm. E.g., see @cappe05 section 11.1.2.2. The idea is to take an average the estimates or sufficient sufficient statistic after some iteration number. As of this writing, the present implementation does not allow one to change the number of particles at each iteration as suggested in @cappe05.
set.seed(30520116) system.time(pf_Fear_avg <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 1000L, smoother = "Fearnhead_O_N", averaging_start = 150L, nu = 8, covar_fac = 1.1, n_threads = n_threads)))
pf_Fear_avg$F show_covar(pf_Fear_avg$Q) pf_Fear_avg$fixed_effects plot(pf_Fear_avg$log_likes, type = "l")
We will run stochastic gradient descent as in @polyak92 which is described in @cappe05 [page 414].
##### # fit to start from set.seed(30520116) pf_start <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), Q = diag(1, 2), Fmat = matrix(c(.1, 0, 0, .1), 2), by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 200, N_smooth = 400, N_first = 1000, eps = 1e-5, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 1L, smoother = "Fearnhead_O_N", nu = 8, covar_fac = 1.1, n_threads = n_threads))
# Function to perform stochastic gradient descent. Uses Adam algorithm # suggested by Kingma et al. (2015). # # Args: # object: an object of class PF_EM. # n_runs: number of iterations. # ns: number of particles at each iteration. # debug: TRUE if information should be printed during estimation. # use_O_n_sq: TRUE if O(N^2) method should be used. # lr: learning rate. # mp: decay rate for first moment. # vp: decay rate for second moment. # # Returns: # list with final estimates, log-likelihood approximations at each iteration, # and estimates at each iteration. sgd <- function(object, n_runs, ns, debug = FALSE, use_O_n_sq = FALSE, lr = .001, mp = .9, vp = .999) { ##### # get object to perform computation and matrices and vectors for output comp_obj <- PF_get_score_n_hess(object, use_O_n_sq = use_O_n_sq) state <- matrix( NA_real_, n_runs + 1L, length(object$F) + length(object$Q)) # setup matrix to map from gradient to the state (the covariance part is only # the lower diagonal) dF <- length(object$F) dQ <- NCOL(object$Q) nQ <- (dQ * (dQ + 1L)) / 2L K <- matrix(0., ncol(state), length(object$F) + nQ) K[ 1:dF , 1:dF] <- diag(dF) # can use e.g., matrixcal instead to get the duplication matrix K[-(1:dF), -(1:dF)] <- dynamichazard:::.get_dup_mat(dQ) dfix <- length(object$fixed_effects) obs <- matrix( NA_real_, n_runs + 1L, dfix) state[1, ] <- c(object$F, object$Q) obs [1, ] <- object$fixed_effects lls <- rep(NA_real_, n_runs) mv <- vv <- rep(0., ncol(K) + dfix) for(i in 1:n_runs){ comp_obj$set_n_particles(N_fw = ns[i], N_first = ns[i]) fw <- comp_obj$run_particle_filter() lls[i] <- logLik(fw) score <- comp_obj$get_get_score_n_hess(only_score = TRUE) sc <- score$score mv <- mp * mv + (1 - mp) * sc vv <- vp * vv + (1 - vp) * sc^2 muse <- mv / (1 - mp^i) vuse <- vv / (1 - vp^i) direc <- muse / (sqrt(vuse) + 1e-8) # step-half until we have a valid set a of parameters n_halfs <- 0L n_halfs_max <- 25L lri <- lr while((n_halfs <- n_halfs + 1L) <= n_halfs_max){ state_i <- state[i + 1L, ] <- state[i, ] + lri * drop(K %*% direc[-(1:dfix)]) obs_i <- obs [i + 1L, ] <- obs [i, ] + lri * direc[1:dfix] lri <- lri / 2 # check that system is stationary Ftmp <- matrix(state_i[1:dQ^2], dQ, dQ) if(any(Mod(eigen(Ftmp)$values) >= 1)) next # set parameters o <- comp_obj$set_parameters(state = state_i, obs = obs_i) # Q is positive definite if(all(eigen(o$Q)$values > 0.)) break } if(n_halfs > n_halfs_max) stop("step halvings failed") if(debug){ msg <- paste0(sprintf("It %4d Log-likelihood %10.2f (max %10.2f)", i, lls[i], max(lls, na.rm = TRUE))) cat(msg, "\n", rep("-", nchar(msg)), "\n", sep = "") cat(sprintf("Gradient norm %14.4f\n", norm(t(sc)))) print(o$Fmat) print(o$Q) print(o$fixed_effects) cat("\n") } } list(o = o, lls = lls, state = state, obs = obs) } # number of iterations n_runs <- 400L # number of particles to use ns <- ceiling(exp(seq(log(100), log(1000), length.out = n_runs))) out <- sgd(n_runs = n_runs, ns = ns, object = pf_start, lr = .005) lls <- out$lls o <- out$o
The results are illustrated below
# approximate log-likelihood at each iteration par(mar = c(5, 4, 1, 1)) plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l") print(max(lls), digits = 5) # the estimates o$Fmat show_covar(o$Q) o$fixed_effects
We can also use the method from @Poyiadjis11. The present implementation scales poorly in the number of particles but the method has a variance that may be linear in the number of time periods instead of at least quadratic. This is important for long time series.
# number of iterations n_runs <- 400L # number of particles to use ns <- ceiling(exp(seq(log(200), log(1000), length.out = n_runs))) out <- sgd(n_runs = n_runs, ns = ns, use_O_n_sq = TRUE, object = pf_start, lr = .005) lls <- out$lls o <- out$o
The results are illustrated below (the log-likelihood approximations from the forward particle filter are poor approximations due to few particles).
# approximate log-likelihood at each iteration par(mar = c(5, 4, 1, 1)) plot(lls, xlab = "Iteration", ylab = "Log-likelihood", type = "l") print(max(lls), digits = 5) # the estimates o$Fmat show_covar(o$Q) o$fixed_effects
We can approximate standard errors as follows
<button onclick="show_obs_info()" style=" color: #494949 !important; background: #ffffff; padding: 5px; border: 4px solid #494949 !important; border-radius: 6px; display: inline-block; margin: 5px 0;"
Show/hide code and result
We may question what the log-likelihood is at the true parameters. We can use
the PF_forward_filter
function to perform approximate log-likelihood evaluation
fw_precise <- PF_forward_filter( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), N_fw = 100000, N_first = 100000, df, type = "VAR", model = "exponential", max_T = n_periods, by = 1, control = PF_control( N_fw_n_bw = 1, N_smooth = 1, N_first = 1, method = "AUX_normal_approx_w_cloud_mean", smoother = "Fearnhead_O_N", n_threads = n_threads, nu = 8, covar_fac = 1.1), Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, R = diag(1, 2), fixed_effects = beta) logLik(fw_precise)
The log-likelihood from the final model we estimated is
print(tail(pf_Fear_more$log_likes, 1), digits = 7) print(tail(pf_Brier$log_likes, 1), digits = 7)
or we can get a more precise estimate by calling (though the log-likelihood is evaluated at the final parameter estimates and not the iteration one step prior as above)
print( logLik(PF_forward_filter(pf_Fear_more, N_fw = 20000, N_first = 20000)), digits = 7)
A question is what happens if we start at the true parameter values? We may expect that we only take a few EM iterations and end up at the MLE or another local maximum not to far from the true parameters
set.seed(30520116) pf_Fear_close <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Fmat = Fmat, a_0 = c(0, 0), Q = Qmat, Q_0 = Q_0, fixed_effects = beta, by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 100, smoother = "Fearnhead_O_N", nu = 8, covar_fac = 1.1, n_threads = n_threads))
The final estimates are
pf_Fear_close$F show_covar(pf_Fear_close$Q) pf_Fear_close$fixed_effects
The log-likelihoods at the EM iterations look as follows
plot(pf_Fear_close$log_likes, type = "l")
The model can be written as in terms of 2 parameters for the state model by writing it as
$$ F = \begin{pmatrix} \theta & 0 \ 0 & \theta \end{pmatrix}, \quad Q = \begin{pmatrix} \exp(2\psi) & 0 \ 0 & \exp(2\psi) \end{pmatrix} $$
We can estimate the restricted model as follows (see the vignette mentioned in
the beginning for details about the arguments to PF_EM
).
G <- matrix(0, 2^2, 1) G[1, 1] <- G[4, 1] <- 1 J <- matrix(1, 2, 1) K <- matrix(nrow = 2 * (2 - 1) / 2, ncol = 0) pf_Fear_restrict <- PF_EM( Surv(tstart, tstop, y) ~ x + ddFixed(x) + ddFixed_intercept(TRUE), df, Q_0 = diag(1, 2), G = G, J = J, K = K, theta = .1, psi = 0, phi = numeric(), by = 1, type = "VAR", model = "exponential", max_T = n_periods, control = PF_control( N_fw_n_bw = 500, N_smooth = 2000, N_first = 2000, eps = .001, method = "AUX_normal_approx_w_cloud_mean", fix_seed = FALSE, n_max = 100, smoother = "Fearnhead_O_N", nu = 8, covar_fac = 1.1, n_threads = n_threads))
The final estimates are
pf_Fear_restrict$F show_covar(pf_Fear_restrict$Q) pf_Fear_restrict$fixed_effects
and a log-likelihood approximation is
print( logLik(PF_forward_filter(pf_Fear_restrict, N_fw = 20000, N_first = 20000)), digits = 7)
We have $2^2 + 2(2 + 1) / 2 - 4 = 3$ fewer parameters so the difference in log-likelihood to the full model seems reasonable.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.