Description Usage Arguments Details Note Examples
Generate a data table with example data
1 | datagen(N, censor = 80)
|
N |
integer. The number of individuals in the dataset. |
censor |
numeric. The total observation period. Individuals are removed
from the dataset if they do not exit to |
The dataset simulates a labour market programme. People entering the dataset are without a job.
They experience two hazards, i.e. probabilities per time period. They can either get a job and exit from
the dataset, or they can enter a labour market programme, e.g. a subsidised job or similar, and remain
in the dataset and possibly get a job later.
In the terms of this package, there are two transitions, "job"
and "program"
.
The two hazards are influenced by covariates observed by the researcher, called "x1"
and
"x2"
. In addition there are unobserved characteristics influencing the hazards. Being
on a programme also influences the hazard to get a job. In the generated dataset, being on
a programme is the indicator variable alpha
. While on a programme, the only transition that can
be made is "job"
.
The dataset is organized as a series of rows for each individual. Each row is a time period with constant covariates.
The length of the time period is in the covariate duration
.
The transition being made at the end of the period is coded in the covariate d
. This
is an integer which is 0 if no transition occurs (e.g. if a covariate changes), it is 1 for
the first transition, 2 for the second transition. It can also be a factor, in which case the
level marking no transition must be called "none"
.
The covariate alpha
is zero when unemployed, and 1 if on a programme. It is used
for two purposes. It is used as an explanatory variable for transition to job, this yields
a coefficient which can be interpreted as the effect of being on the programme. It is also
used as a "state variable", as an index into a "risk set". I.e. when estimating, the
mphcrm
function must be told which risks/hazards are present.
When on a programme the "toprogram"
transition can not be made. This is implemented
by specifying a list of risksets and using alpha+1
as an index into this set.
The two hazards are modeled as exp(X β + μ), where X is a matrix of covariates β is a vector of coefficients to be estimated, and μ is an intercept. All of these quantities are transition specific. This yields an individual likelihood which we call M_i(μ). The idea behind the mixed proportional hazard model is to model the individual heterogeneity as a probability distribution of intercepts. We obtain the individual likelihood L_i = ∑_j p_j M_i(μ_j), and, thus, the likelihood L = ∑_j L_j.
The likelihood is to be maximized over the parameter vectors β (one for each transition), the masspoints μ_j, and probabilites p_j.
The probability distribution is built up in steps. We start with a single masspoint, with probability 1. Then we search for another point with a small probability, and maximize the likelihood from there. We continue with adding masspoints until we no longer can improve the likelihood.
The example illustrates how data(durdata)
was generated.
1 2 3 4 5 6 7 8 9 10 11 | data.table::setDTthreads(1) # avoid screams from cran-testing
dataset <- datagen(5000,80)
print(dataset)
risksets <- list(unemp=c("job","program"), onprogram="job")
# just two iterations to save time
Fit <- mphcrm(d ~ x1+x2 + ID(id) + D(duration) + S(alpha+1) + C(job,alpha),
data=dataset, risksets=risksets,
control=mphcrm.control(threads=1,iters=2))
best <- Fit[[1]]
print(best)
summary(best)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.