Description Details References
The main method of the package is mphcrm
. It has an interface
somewhat similar to lm
. There is an example of use in datagen
, with
a generated dataset similiar to the ones in Gaure et al. (2007). For those who have
used the program used in that paper, a mixture of R, Fortran, C, and python,
this is an entirely new self-contained package, written from scratch with 12 years of experience.
Currently not all functionality from that behemoth has been implemented, but most of it.
A short description of the model follows.
There are some individuals with some observed covariates X_i. The individuals are observed for some time, so there is typically more than one observation of each individual. At any point they experience one or more hazards. The hazards are assumed to be of the form h_i^j = exp(X_i β_j), where β_j are coefficients for hazard j. The hazards themselves are not observed, but an event associated with them is, i.e. a transition of some kind. The time of the transition, either exactly recorded, or within an interval, must also be in the data set. With enough observations it is then possible to estimate the coefficients β_j.
However, it just so happens that contrary to ordinary linear models, any unobserved heterogeneity may bias the estimates, not just increase uncertainty. To account for unobserved heterogeneity, a random intercept is introduced, so that the hazards are of the form h_i^j(μ_k) = exp(X_i β_j + μ_k) for k between 1 and some n. The intercept may of course be written multiplicatively as exp(X_i β_j) exp(μ_k), that is why they are called proportional hazards.
The individual likelihood depends on the intercept, i.e. L_i(μ_k), but we integrate it out so that the individual likelihood becomes ∑ p_k L_i(μ_k). The resulting mixture likelihood is maximized over all the βs, n, the μ_ks, and the probabilities p_k.
Besides the function mphcrm
which does the actual estimation, there are functions for
extracting the estimated mixture, they are mphdist
, mphmoments
and a few more.
There's a summary function for the fitted model, and there is a data set available with data(durdata)
which
is used for demonstration purposes. Also, an already fitted model is available there, as fit
.
The package may use more than one cpu, the default is taken from getOption("durmod.threads")
which is initialized from the environment variable DURMOD_THREADS, OMP_THREAD_LIMIT,
OMP_NUM_THREADS or NUMBER_OF_PROCESSORS, or parallel::detectCores() upon loading the package.
For more demanding problems, a cluster of machines (from packages parallel or snow) can be used, in combination with the use of threads.
There is a vignette (vignette("whatmph")
) with more details about durmod and data layout.
Gaure, S., K. Røed and T. Zhang (2007) Time and causality: A Monte-Carlo Assessment of the timing-of-events approach, Journal of Econometrics 141(2), 1159-1195. https://doi.org/10.1016/j.jeconom.2007.01.015
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.