Generate a data table with example data
datagen(N, censor = 80)
integer. The number of individuals in the dataset.
numeric. The total observation period. Individuals are removed
from the dataset if they do not exit to
The dataset simulates a labour market programme. People entering the dataset are without a job.
They experience two hazards, i.e. probabilities per time period. They can either get a job and exit from
the dataset, or they can enter a labour market programme, e.g. a subsidised job or similar, and remain
in the dataset and possibly get a job later.
In the terms of this package, there are two transitions,
The two hazards are influenced by covariates observed by the researcher, called
"x2". In addition there are unobserved characteristics influencing the hazards. Being
on a programme also influences the hazard to get a job. In the generated dataset, being on
a programme is the indicator variable
alpha. While on a programme, the only transition that can
be made is
The dataset is organized as a series of rows for each individual. Each row is a time period with constant covariates.
The length of the time period is in the covariate
The transition being made at the end of the period is coded in the covariate
is an integer which is 0 if no transition occurs (e.g. if a covariate changes), it is 1 for
the first transition, 2 for the second transition. It can also be a factor, in which case the
level marking no transition must be called
alpha is zero when unemployed, and 1 if on a programme. It is used
for two purposes. It is used as an explanatory variable for transition to job, this yields
a coefficient which can be interpreted as the effect of being on the programme. It is also
used as a "state variable", as an index into a "risk set". I.e. when estimating, the
mphcrm function must be told which risks/hazards are present.
When on a programme the
"toprogram" transition can not be made. This is implemented
by specifying a list of risksets and using
alpha+1 as an index into this set.
The two hazards are modeled as exp(X β + μ), where X is a matrix of covariates β is a vector of coefficients to be estimated, and μ is an intercept. All of these quantities are transition specific. This yields an individual likelihood which we call M_i(μ). The idea behind the mixed proportional hazard model is to model the individual heterogeneity as a probability distribution of intercepts. We obtain the individual likelihood L_i = ∑_j p_j M_i(μ_j), and, thus, the likelihood L = ∑_j L_j.
The likelihood is to be maximized over the parameter vectors β (one for each transition), the masspoints μ_j, and probabilites p_j.
The probability distribution is built up in steps. We start with a single masspoint, with probability 1. Then we search for another point with a small probability, and maximize the likelihood from there. We continue with adding masspoints until we no longer can improve the likelihood.
The example illustrates how
data(durdata) was generated.
1 2 3 4 5 6 7 8 9 10 11
data.table::setDTthreads(1) # avoid screams from cran-testing dataset <- datagen(5000,80) print(dataset) risksets <- list(unemp=c("job","program"), onprogram="job") # just two iterations to save time Fit <- mphcrm(d ~ x1+x2 + ID(id) + D(duration) + S(alpha+1) + C(job,alpha), data=dataset, risksets=risksets, control=mphcrm.control(threads=1,iters=2)) best <- Fit[] print(best) summary(best)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.